RootsWeb.com Mailing Lists
Previous Page      Next Page
Total: 7760/10000
    1. Re: [R-M222] M222 Tree
    2. Sandy Paterson
    3. I've set up an Excel spreadsheet at http://dl.dropbox.com/u/2733445/EWCON.xlsx Column A is Paul Conroy's 37-marker haplotype. Column B is the 37-marker haplotype of Ewing 26605. The CC and the RCC are in cells C37 and D37. The RCC is 97.11. If you change the CDYb value in column B from 38 to 37, the RCC changes from 97.11 to 113.26. Changes of 1 at other markers result in smaller changes in RCC. I think it would be worthwhile if someone were to check this independently from first principles. Having said that, I get the same answer of 97.11 using my own software, working from first principles. Sandy -----Original Message----- From: dna-r1b1c7-bounces@rootsweb.com [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Sandy Paterson Sent: 11 July 2011 06:48 To: dna-r1b1c7@rootsweb.com Subject: Re: [R-M222] M222 Tree [I've done that myself. If a marker (it doesn't make any difference which one) with the value of 12 is altered to 13 you will always get the same CC. Change another marker with a value of 29 to 30 and you will get a different CC. In genetic distance computations the result would be

    07/11/2011 03:05:31
    1. Re: [R-M222] M222+ vs M222-
    2. Bill Howard
    3. Paul, If two datasets have identical haplotypes, and if a procedure based only on haplotypes leads to the same conclusion about those haplotypes, then it doesn't matter. You obviously have not thought deeply enough about this issue. Until you do, it's a waste of my time responding to you, particularly when your postings are turning nasty. - Bill Howard On Jul 10, 2011, at 11:50 PM, Paul Conroy wrote: > Bill, > > Stop creating strawman arguments, stop the ad honinem insults to everyone > who offered constructive criticism of your flawed methods. You seem to have > completely missed the point being made, by at least 5 commenters, which is > that you are not using the correct dataset, so your results are flawed. > > David just presented a well reasoned argument on where you are going wrong, > and you completely dismissed it - it seems that somehow you just don't > understand why what you are doing is wrong. > > I'll give you a simpler analogy. Supposing I was asked to calculate the mean > height of my siblings, then I would add their heights and divide by the > number of siblings to get my answer. What you are doing to get the answer is > to add the heights of my first cousins and some second cousins and divide by > the number to get the mean - while arguing that first and second cousins > should be close enough to siblings as not to matter one bit. > > Hope that helps! > > Cheers, > Paul > > On Sun, Jul 10, 2011 at 10:56 PM, Bill Howard <weh8@verizon.net> wrote: > >> David, >> >> I appreciate what you said. Having developed the RCC correlation technique, >> I was surprised to see its applicability to genetics to be very important, >> and its application to genealogy to be less important than I anticipated. In >> the case of M222, where nobody knows where it originated (arguments are pro >> and con for Ireland and Scotland) and people even argue about what language >> the progenitor spoke (!), and whether the Niall myth intersects with the >> SNP, or originated with Niall, etc., I find myself in a thicket where I have >> an application that may well provide some insight into the situation yet I >> find myself deep in arguments about whether the same haplotypes that are SNP >> tested as M222 are the same as virtually identical haplotypes that pass >> identity tests for haplotype similarity. Those arguments are coming from >> people who have not taken the time or made the effort to read my papers or >> read my FAQs, or don't care to. For my purposes, it doesn't really matter. >> >> I think many of the points raised look like the old philosophical arguments >> about how many angels can dance on the head of a pin (grin!). sorry — my >> sense of humor is getting the best of me here. >> >> Suffice it to say that I have an application that can take haplotypes, in >> and out of M222, and I can estimate a TMRCA with them. I can't tell what >> language the progenitor spoke; I cannot tell his origin although heuristic >> arguments may well point to a rather narrow region but with much >> uncertainty; and I can't tell what he had for breakfast like we can for the >> famous Alpine Iceman. Interestingly, the time of the MRCA of all the >> haplotypes, M22+ or -, is not in the genealogical time frame, so most of the >> arguments above are irrelevant, if not moot to those interested in the >> genealogical time scale. >> >> You say that there may be trouble conceptualizing why I insist (I prefer >> 'state') that I can derive useful information from "unrelated" haplotype >> strings. There is no such thing as an unrelated haplotype string. Here >> again, people who have that trouble have not digested my two JoGG papers or >> the content of my FAQ listing. There I show that I can reproduce the ISOGG >> sequence fairly satisfactorily and I indicate that there is no evidence that >> the RCC time scale cannot be used tens of thousands of years in the past -- >> for finding TMRCAs of representative haplogroups of haplotypes. >> >> You keep using the phrase "unrelated data sets". That is a meaningless term >> unless you define it more precisely. If you are comparing haplotypes, they >> will always be related in ways that are closer the more recent their MRCA >> and farther apart the more distant their MRCA. In this sense, you cannot use >> the term "unrelated". I used it in my first JoGG paper much to my dismay, >> but I was more naive then. In Table 1 of my first paper I use the term >> 'probably not unrelated' when RCCs of 40 or more are encountered. That >> should be read as such by a genealogist, but not by a geneticist, for >> reasons I just mentioned. >> >> Again, I reiterate that we can estimate the TMRCAs of any pair, but we must >> be aware of the time uncertainties that are involved, and that requires a >> computation, or at least a well-founded estimate, of the standard deviation >> of the time we find. >> >> Hope that helps, David. >> - Bye from Bill >> >> On Jul 10, 2011, at 8:33 PM, J David Grierson wrote: >> >>> >>> Bill, >>> I'm sorry to buy into this discussion, but I think it is incredibly >>> important that we all, including the tyros in this (amateur) business, >>> understand what we mean. >>> In the ten years of so that we enthusiasts have been dealing with DNA >>> genealogy, a certain jargon has grown up. The professionals in the >> business >>> have developed more and more complicated ways of describing haplotypes >> and >>> haplogroups, using alphanumeric strings. To simplify things, we >> amateurs use >>> shorthand, such as M222 (which is the code for a certain clade in the >>> greater R haplogroup, but is commonly referred to as a haplogroup >> itself) >>> because if saves a lot of fiddling referencing when communicating. >> Included >>> in the jargon is the meaning of M222+, which means that a testee has >> been >>> shown positively to carry the M222 mutation. M222-, conversely, means >> that a >>> testee has been shown positively NOT to carry the mutation, and is most >>> commonly attached to a member of the ancestral mutation (L21 as far as >> we >>> know). Rarely, it might be attached to the member of a more remotely >>> connected clade of the "R" haplogroup, who has taken the test for >> whatever >>> reason. We can further say that if the haplotype itself does not give >> us a >>> degree of certainty (see below), and the bearer has not taken the SNP >> test, >>> then technically the haplogroup (or the SNP) is "unknown". >>> If the relationship was even more remote, ie, if a member of an >> unconnected >>> (except in primeval terms) haplogroup took the test, under current >>> assumptions that would be a waste of effort, and the TMRCA would be in >> the >>> tens of thousands of years. Now, we DNA genealogists are predominantly >>> interested in the most recent millennium, because that encompasses the >> time >>> of the use of surnames, and, generally speaking, the period of usable >> record >>> keeping. However, that is not to say that we don't also have an >> interest in >>> estimating the age of our haplogroup. But we mostly don't have a >> particular >>> interest in estimating the relative distance between unconnected >>> haplogroups. Hence our insistence that knowledge of haplogroup is >> essential; >>> indeed, I have trouble conceptualizing why you insist that you can >> derive >>> useful information from unrelated haplotype strings. >>> Now you said "If the two haplotype strings are statistically the same, >> I >>> don't really care. They lead to the same dates." >>> There is a very good reason for this. The two sets of data you were >> given >>> were, with a very low probability of error, all carriers of the M222 >>> mutation, (theoretically M222+), even though untested. That's why they >> are >>> statistically the same, and lead to the same dates. It is true that >> more >>> than half had not been tested positively for M222. However, the >>> characteristic haplotype for the bearers of this mutation is such that >> M222+ >>> can be predicted quite reliably. This turns on the following particular >> DYS >>> values: in the FTDNA markers 1-67, DYS385b=13, 392=14, 448=18, 449=30, >>> YCAIIb=23, 607=16, 413a=21, 534=16, 481=25. Latterly I have further >>> identified DYS710=35, 714=24, 549=12, and 513=13 in the 68-111 marker >> range. >>> Because M222 is a "young" SNP mutation, there haven't been many random >> DYS >>> mutations since then, so most M222 members carry the great majority of >> the >>> above DYS values. >>> So, in our various ways, I think we are trying to say to you that using >> this >>> particular data, you shouldn't draw conclusions about unrelated data >> sets. >>> These are related. I accept your statistical expertise, but ask, what >>> knowledge do we gain by comparing unrelated data sets, say, members of >>> Haplogroups defined by M222, L21, I1 and G2a? I ask because they are >> the >>> haplogroups identified in my surname study. We already know that they >>> divided one from the other during the last 40,000 years, but unless we >> are >>> trying to define an individual's place in all of this, what is to be >> gained? >>> Regards >>> David Grierson >>> On 11/07/2011 8:25 AM, Bill Howard wrote: >>> >>> Paul, >>> If the two haplotype strings are statistically the same, I don't really >> care. T >>> hey lead to the same dates. >>> I agree, we are now beating on a dead horse. >>> I am sorry you think I am tiresome but you don't appear to understand >> that the >>> date of origin depends only on the haplotypes presented to the program, >> not whe >>> ther or not it is a member of a particular SNP. >>> (In the two postings immediately below, I find that only you used the >> word "unk >>> nown")….. >>> - Bye from Bill Howard >>> >>> On Jul 10, 2011, at 6:11 PM, Paul Conroy wrote: >>> >>> Bill, >>> >>> Once again M222- does NOT mean untested, it mean (sic) TESTED NEGATIVE. >>> >>> Unknown means untested. >>> >>> You're getting tiresome. >>> >>> >>> On 7/10/11, Bill Howard [1]<weh8@verizon.net> wrote: >>> >>> Hi, David, >>> >>> I did see your posting and I apologize for being a bit tardy in my reply. >>> >>> I got into this when a friend suggested looking into the M222 SNP and to >> see >>> if there is a connection between it and Niall and his descendants. My >> look >>> at the situation indicates that, while Niall and the UiNeills may have >>> carried the SNP, it cannot be proved that they did so. My date >> determination >>> (see below) indicates that the SNP did not originate with them. >>> >>> In the process I became aware that one of the things that the DNA folks >>> >>> wanted to do was to try to date the origin of the M222 SNP. Since my RCC >>> approach could do that estimate, I wanted to analyze haplotypes that were >> in >>> the M222 family. >>> To prepare for the analysis, I was given a large list of M222 folks, and >>> later found that only some of them had been SNP tested. I found that only >>> slightly in excess of 320 had actually been tested, so I collected them >> as a >>> second database. >>> >>> Next, there was a list exchange that suggested that the M222 group should >> be >>> separated into plus and minus groupings, with minus not being >> well-defined >>> except that they had not been tested. Before that exchange I tried to >> see >>> if I could separate the plusses and the minuses by their haplotypes >> alone, >>> and I found that they were statistically the same. If there was a >> separation >>> by SNP testing they certainly did not stand out as being separate from >> their >>> haplotypes. That analysis has already been posted. >>> >>> Now, since they looked to be the same, I separated my analysis into the >> two >>> databases, the ones that had been called M222, a mixture of those tested >> and >>> untested, and only those that had been tested. I ran a TMRCA for both >> groups >>> and found that the answers were the same within the estimated error of >> about >>> 300 years SD. >>> >>> It is a bit premature at this stage to give the answer I got since it has >>> not been fully discussed with my potential co-author, but it was >>> considerably earlier than Niall and was more like the dates that John >> McEwan >>> got in the BC era. More on this later. >>> >>> To address your question about how I can calculate a time for the >> mixture, I >>> say that if I cannot distinguish the difference from the haplotypes and >>> >>> since Mathematica works only on those haplotypes (without any knowledge >> of >>> which group it is being given to analyze), I should get the same answer >> if I >>> use either the large or the small sample. And that's what I got, again >>> within the uncertainty of the errors involved. The answer for the M222 >> plus >>> sample is statistically the same as the answer from the larger database. >>> That's because the haplotypes inputted to Mathematica in the two samples >>> were statistically the same. So, if you want the answer to dating M222 >> plus >>> alone, it is the same date. I think that my analysis has been >> professionally >>> rigorous given the statistical equalities within the two databases. I >> hope >>> this answers your questions, David. >>> >>> - Bye from Bill Howard >>> >>> >>> >>> On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: >>> >>> Dear Bill, >>> Yesterday I posted a note concerning the M222 SNP status of your data >>> (see below), but you have not responded. Can you please comment on what I >>> said. I am particularly concerned about your dating of the time of the >>> >>> M222 >>> mutation. If you are looking at samples of M222+ that are mixed with >>> M222-, >>> how can you calculate a time of the mutation? >>> David >>> >>> Dear Bill, >>> As a biological scientist I find it distressing that you and others are >>> trying to convince us that it doesn't really matter if your SNP test does >>> or >>> does not show that you are M222+, you can still be included in the M222 >>> project on the basis of your STR haplotype. Data based on such an >>> assumption >>> would not be acceptable in a rigorous scientific journal. >>> It would seem to me that the benchmark of the M222 project should be >>> >>> the >>> presence of M222+. At some stage in our background two brothers may have >>> had >>> an identical or nearly identical STR haplotype, but brother one had a de >>> novo mutation that created the M222 SNP and brother two did not. The >>> descendants of brother one would be M222+ and the descendants of brother >>> two >>> would be M222-. This de novo mutation occurred at a specific date and we >>> would all be very interested in that date. However, if the samples used >> to >>> measure that date are a mixture of = and - SNPs, then you can't measure >>> the >>> date of appearance of M222 accurately because common STR haplotypes would >>> predate the appearance of the M222 SNP. >>> Let's focus on the rigor of the analysis, not the cost of SNP testing. >>> David >>> >>> -- >>> Dr. David H. MacLennan, >>> Banting and Best Department of Medical Research, >>> University of Toronto, Charles H. Best Institute, >>> 112 College St., Toronto, Ontario, Canada M5G1L6 >>> Tel:1-416-978-5008 Fax:1-416-978-8528 >>> [2]http://www.utoronto.ca/maclennan >>> >>> >>> >>> R1b1c7 Research and Links: >>> >>> [3]http://clanmaclochlainn.com/R1b1c7/ >>> ------------------------------- >>> To unsubscribe from the list, please send an email to >>> [4]DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without >> the >>> >>> quotes in the subject and the body of the message >>> >>> -- >>> Sent from my mobile device >>> R1b1c7 Research and Links: >>> >>> [5]http://clanmaclochlainn.com/R1b1c7/ >>> ------------------------------- >>> To unsubscribe from the list, please send an email to >> [6]DNA-R1B1C7-request@roo >>> tsweb.com with the word 'unsubscribe' without the quotes in the subject >> and the >>> body of the message >>> >>> >>> R1b1c7 Research and Links: >>> >>> [7]http://clanmaclochlainn.com/R1b1c7/ >>> ------------------------------- >>> To unsubscribe from the list, please send an email to >> [8]DNA-R1B1C7-request@roo >>> tsweb.com with the word 'unsubscribe' without the quotes in the subject >> and the >>> body of the message >>> >>> >>> ----- >>> No virus found in this message. >>> Checked by AVG - [9]www.avg.com >>> Version: 10.0.1388 / Virus Database: 1516/3757 - Release Date: 07/10/11 >>> >>> References >>> >>> 1. mailto:weh8@verizon.net >>> 2. http://www.utoronto.ca/maclennan >>> 3. http://clanmaclochlainn.com/R1b1c7/ >>> 4. mailto:DNA-R1B1C7-request@rootsweb.com >>> 5. http://clanmaclochlainn.com/R1b1c7/ >>> 6. mailto:DNA-R1B1C7-request@rootsweb.com >>> 7. http://clanmaclochlainn.com/R1b1c7/ >>> 8. mailto:DNA-R1B1C7-request@rootsweb.com >>> 9. http://www.avg.com/ >>> R1b1c7 Research and Links: >>> >>> http://clanmaclochlainn.com/R1b1c7/ >>> ------------------------------- >>> To unsubscribe from the list, please send an email to >> DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the >> quotes in the subject and the body of the message >> >> >> R1b1c7 Research and Links: >> >> http://clanmaclochlainn.com/R1b1c7/ >> ------------------------------- >> To unsubscribe from the list, please send an email to >> DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the >> quotes in the subject and the body of the message >> > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/11/2011 02:55:14
    1. Re: [R-M222] M222 Tree
    2. Sandy Paterson
    3. [I've done that myself. If a marker (it doesn't make any difference which one) with the value of 12 is altered to 13 you will always get the same CC. Change another marker with a value of 29 to 30 and you will get a different CC. In genetic distance computations the result would be two (in the above example) but it would be something slightly different in the correlation approach. A marker change from 39 to 40 would be yet a different CC value. I haven't been able to figure out yet exactly what the corrrelation approach is doing mathematically yet. I'm not sure what difference applying correlation to every marker makes in comparison to genetic distance since most of the markers will be the same in any case. Correlation will also only note the changes.] That's given me an idea. I should be able to set up something in Excel that allows anyone to calculate the CC and hence RCC between two haplotypes. I'll use Conroy 16646 and Ewing 26605 37-marker haplotypes as the example. We can then examine the effect on the CC and the RCC of a single mutation at any marker. I'm a bit slow in Excel so it may take a while, but I think it's worth it. Sandy -----Original Message----- From: dna-r1b1c7-bounces@rootsweb.com [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Lochlan@aol.com Sent: 11 July 2011 03:24 To: dna-r1b1c7@rootsweb.com Subject: Re: [R-M222] M222 Tree In a message dated 7/10/2011 1:21:47 P.M. Central Daylight Time, alexanderpatterson@btinternet.com writes: I have explained to bill Bill I think 3 times, that I don't work in Excel and seldom use spreadsheets. Mostly, I write my own software in one of 3 programming languages, depending on the application, and I use files, not spreadsheets, so there is no spreadsheet to send him.

    07/11/2011 12:48:19
    1. Re: [R-M222] M222+ vs M222-
    2. Paul Conroy
    3. Bill, Stop creating strawman arguments, stop the ad honinem insults to everyone who offered constructive criticism of your flawed methods. You seem to have completely missed the point being made, by at least 5 commenters, which is that you are not using the correct dataset, so your results are flawed. David just presented a well reasoned argument on where you are going wrong, and you completely dismissed it - it seems that somehow you just don't understand why what you are doing is wrong. I'll give you a simpler analogy. Supposing I was asked to calculate the mean height of my siblings, then I would add their heights and divide by the number of siblings to get my answer. What you are doing to get the answer is to add the heights of my first cousins and some second cousins and divide by the number to get the mean - while arguing that first and second cousins should be close enough to siblings as not to matter one bit. Hope that helps! Cheers, Paul On Sun, Jul 10, 2011 at 10:56 PM, Bill Howard <weh8@verizon.net> wrote: > David, > > I appreciate what you said. Having developed the RCC correlation technique, > I was surprised to see its applicability to genetics to be very important, > and its application to genealogy to be less important than I anticipated. In > the case of M222, where nobody knows where it originated (arguments are pro > and con for Ireland and Scotland) and people even argue about what language > the progenitor spoke (!), and whether the Niall myth intersects with the > SNP, or originated with Niall, etc., I find myself in a thicket where I have > an application that may well provide some insight into the situation yet I > find myself deep in arguments about whether the same haplotypes that are SNP > tested as M222 are the same as virtually identical haplotypes that pass > identity tests for haplotype similarity. Those arguments are coming from > people who have not taken the time or made the effort to read my papers or > read my FAQs, or don't care to. For my purposes, it doesn't really matter. > > I think many of the points raised look like the old philosophical arguments > about how many angels can dance on the head of a pin (grin!). sorry — my > sense of humor is getting the best of me here. > > Suffice it to say that I have an application that can take haplotypes, in > and out of M222, and I can estimate a TMRCA with them. I can't tell what > language the progenitor spoke; I cannot tell his origin although heuristic > arguments may well point to a rather narrow region but with much > uncertainty; and I can't tell what he had for breakfast like we can for the > famous Alpine Iceman. Interestingly, the time of the MRCA of all the > haplotypes, M22+ or -, is not in the genealogical time frame, so most of the > arguments above are irrelevant, if not moot to those interested in the > genealogical time scale. > > You say that there may be trouble conceptualizing why I insist (I prefer > 'state') that I can derive useful information from "unrelated" haplotype > strings. There is no such thing as an unrelated haplotype string. Here > again, people who have that trouble have not digested my two JoGG papers or > the content of my FAQ listing. There I show that I can reproduce the ISOGG > sequence fairly satisfactorily and I indicate that there is no evidence that > the RCC time scale cannot be used tens of thousands of years in the past -- > for finding TMRCAs of representative haplogroups of haplotypes. > > You keep using the phrase "unrelated data sets". That is a meaningless term > unless you define it more precisely. If you are comparing haplotypes, they > will always be related in ways that are closer the more recent their MRCA > and farther apart the more distant their MRCA. In this sense, you cannot use > the term "unrelated". I used it in my first JoGG paper much to my dismay, > but I was more naive then. In Table 1 of my first paper I use the term > 'probably not unrelated' when RCCs of 40 or more are encountered. That > should be read as such by a genealogist, but not by a geneticist, for > reasons I just mentioned. > > Again, I reiterate that we can estimate the TMRCAs of any pair, but we must > be aware of the time uncertainties that are involved, and that requires a > computation, or at least a well-founded estimate, of the standard deviation > of the time we find. > > Hope that helps, David. > - Bye from Bill > > On Jul 10, 2011, at 8:33 PM, J David Grierson wrote: > > > > > Bill, > > I'm sorry to buy into this discussion, but I think it is incredibly > > important that we all, including the tyros in this (amateur) business, > > understand what we mean. > > In the ten years of so that we enthusiasts have been dealing with DNA > > genealogy, a certain jargon has grown up. The professionals in the > business > > have developed more and more complicated ways of describing haplotypes > and > > haplogroups, using alphanumeric strings. To simplify things, we > amateurs use > > shorthand, such as M222 (which is the code for a certain clade in the > > greater R haplogroup, but is commonly referred to as a haplogroup > itself) > > because if saves a lot of fiddling referencing when communicating. > Included > > in the jargon is the meaning of M222+, which means that a testee has > been > > shown positively to carry the M222 mutation. M222-, conversely, means > that a > > testee has been shown positively NOT to carry the mutation, and is most > > commonly attached to a member of the ancestral mutation (L21 as far as > we > > know). Rarely, it might be attached to the member of a more remotely > > connected clade of the "R" haplogroup, who has taken the test for > whatever > > reason. We can further say that if the haplotype itself does not give > us a > > degree of certainty (see below), and the bearer has not taken the SNP > test, > > then technically the haplogroup (or the SNP) is "unknown". > > If the relationship was even more remote, ie, if a member of an > unconnected > > (except in primeval terms) haplogroup took the test, under current > > assumptions that would be a waste of effort, and the TMRCA would be in > the > > tens of thousands of years. Now, we DNA genealogists are predominantly > > interested in the most recent millennium, because that encompasses the > time > > of the use of surnames, and, generally speaking, the period of usable > record > > keeping. However, that is not to say that we don't also have an > interest in > > estimating the age of our haplogroup. But we mostly don't have a > particular > > interest in estimating the relative distance between unconnected > > haplogroups. Hence our insistence that knowledge of haplogroup is > essential; > > indeed, I have trouble conceptualizing why you insist that you can > derive > > useful information from unrelated haplotype strings. > > Now you said "If the two haplotype strings are statistically the same, > I > > don't really care. They lead to the same dates." > > There is a very good reason for this. The two sets of data you were > given > > were, with a very low probability of error, all carriers of the M222 > > mutation, (theoretically M222+), even though untested. That's why they > are > > statistically the same, and lead to the same dates. It is true that > more > > than half had not been tested positively for M222. However, the > > characteristic haplotype for the bearers of this mutation is such that > M222+ > > can be predicted quite reliably. This turns on the following particular > DYS > > values: in the FTDNA markers 1-67, DYS385b=13, 392=14, 448=18, 449=30, > > YCAIIb=23, 607=16, 413a=21, 534=16, 481=25. Latterly I have further > > identified DYS710=35, 714=24, 549=12, and 513=13 in the 68-111 marker > range. > > Because M222 is a "young" SNP mutation, there haven't been many random > DYS > > mutations since then, so most M222 members carry the great majority of > the > > above DYS values. > > So, in our various ways, I think we are trying to say to you that using > this > > particular data, you shouldn't draw conclusions about unrelated data > sets. > > These are related. I accept your statistical expertise, but ask, what > > knowledge do we gain by comparing unrelated data sets, say, members of > > Haplogroups defined by M222, L21, I1 and G2a? I ask because they are > the > > haplogroups identified in my surname study. We already know that they > > divided one from the other during the last 40,000 years, but unless we > are > > trying to define an individual's place in all of this, what is to be > gained? > > Regards > > David Grierson > > On 11/07/2011 8:25 AM, Bill Howard wrote: > > > > Paul, > > If the two haplotype strings are statistically the same, I don't really > care. T > > hey lead to the same dates. > > I agree, we are now beating on a dead horse. > > I am sorry you think I am tiresome but you don't appear to understand > that the > > date of origin depends only on the haplotypes presented to the program, > not whe > > ther or not it is a member of a particular SNP. > > (In the two postings immediately below, I find that only you used the > word "unk > > nown")….. > > - Bye from Bill Howard > > > > On Jul 10, 2011, at 6:11 PM, Paul Conroy wrote: > > > > Bill, > > > > Once again M222- does NOT mean untested, it mean (sic) TESTED NEGATIVE. > > > > Unknown means untested. > > > > You're getting tiresome. > > > > > > On 7/10/11, Bill Howard [1]<weh8@verizon.net> wrote: > > > > Hi, David, > > > > I did see your posting and I apologize for being a bit tardy in my reply. > > > > I got into this when a friend suggested looking into the M222 SNP and to > see > > if there is a connection between it and Niall and his descendants. My > look > > at the situation indicates that, while Niall and the UiNeills may have > > carried the SNP, it cannot be proved that they did so. My date > determination > > (see below) indicates that the SNP did not originate with them. > > > > In the process I became aware that one of the things that the DNA folks > > > > wanted to do was to try to date the origin of the M222 SNP. Since my RCC > > approach could do that estimate, I wanted to analyze haplotypes that were > in > > the M222 family. > > To prepare for the analysis, I was given a large list of M222 folks, and > > later found that only some of them had been SNP tested. I found that only > > slightly in excess of 320 had actually been tested, so I collected them > as a > > second database. > > > > Next, there was a list exchange that suggested that the M222 group should > be > > separated into plus and minus groupings, with minus not being > well-defined > > except that they had not been tested. Before that exchange I tried to > see > > if I could separate the plusses and the minuses by their haplotypes > alone, > > and I found that they were statistically the same. If there was a > separation > > by SNP testing they certainly did not stand out as being separate from > their > > haplotypes. That analysis has already been posted. > > > > Now, since they looked to be the same, I separated my analysis into the > two > > databases, the ones that had been called M222, a mixture of those tested > and > > untested, and only those that had been tested. I ran a TMRCA for both > groups > > and found that the answers were the same within the estimated error of > about > > 300 years SD. > > > > It is a bit premature at this stage to give the answer I got since it has > > not been fully discussed with my potential co-author, but it was > > considerably earlier than Niall and was more like the dates that John > McEwan > > got in the BC era. More on this later. > > > > To address your question about how I can calculate a time for the > mixture, I > > say that if I cannot distinguish the difference from the haplotypes and > > > > since Mathematica works only on those haplotypes (without any knowledge > of > > which group it is being given to analyze), I should get the same answer > if I > > use either the large or the small sample. And that's what I got, again > > within the uncertainty of the errors involved. The answer for the M222 > plus > > sample is statistically the same as the answer from the larger database. > > That's because the haplotypes inputted to Mathematica in the two samples > > were statistically the same. So, if you want the answer to dating M222 > plus > > alone, it is the same date. I think that my analysis has been > professionally > > rigorous given the statistical equalities within the two databases. I > hope > > this answers your questions, David. > > > > - Bye from Bill Howard > > > > > > > > On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: > > > > Dear Bill, > > Yesterday I posted a note concerning the M222 SNP status of your data > > (see below), but you have not responded. Can you please comment on what I > > said. I am particularly concerned about your dating of the time of the > > > > M222 > > mutation. If you are looking at samples of M222+ that are mixed with > > M222-, > > how can you calculate a time of the mutation? > > David > > > > Dear Bill, > > As a biological scientist I find it distressing that you and others are > > trying to convince us that it doesn't really matter if your SNP test does > > or > > does not show that you are M222+, you can still be included in the M222 > > project on the basis of your STR haplotype. Data based on such an > > assumption > > would not be acceptable in a rigorous scientific journal. > > It would seem to me that the benchmark of the M222 project should be > > > > the > > presence of M222+. At some stage in our background two brothers may have > > had > > an identical or nearly identical STR haplotype, but brother one had a de > > novo mutation that created the M222 SNP and brother two did not. The > > descendants of brother one would be M222+ and the descendants of brother > > two > > would be M222-. This de novo mutation occurred at a specific date and we > > would all be very interested in that date. However, if the samples used > to > > measure that date are a mixture of = and - SNPs, then you can't measure > > the > > date of appearance of M222 accurately because common STR haplotypes would > > predate the appearance of the M222 SNP. > > Let's focus on the rigor of the analysis, not the cost of SNP testing. > > David > > > > -- > > Dr. David H. MacLennan, > > Banting and Best Department of Medical Research, > > University of Toronto, Charles H. Best Institute, > > 112 College St., Toronto, Ontario, Canada M5G1L6 > > Tel:1-416-978-5008 Fax:1-416-978-8528 > > [2]http://www.utoronto.ca/maclennan > > > > > > > > R1b1c7 Research and Links: > > > > [3]http://clanmaclochlainn.com/R1b1c7/ > > ------------------------------- > > To unsubscribe from the list, please send an email to > > [4]DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without > the > > > > quotes in the subject and the body of the message > > > > -- > > Sent from my mobile device > > R1b1c7 Research and Links: > > > > [5]http://clanmaclochlainn.com/R1b1c7/ > > ------------------------------- > > To unsubscribe from the list, please send an email to > [6]DNA-R1B1C7-request@roo > > tsweb.com with the word 'unsubscribe' without the quotes in the subject > and the > > body of the message > > > > > > R1b1c7 Research and Links: > > > > [7]http://clanmaclochlainn.com/R1b1c7/ > > ------------------------------- > > To unsubscribe from the list, please send an email to > [8]DNA-R1B1C7-request@roo > > tsweb.com with the word 'unsubscribe' without the quotes in the subject > and the > > body of the message > > > > > > ----- > > No virus found in this message. > > Checked by AVG - [9]www.avg.com > > Version: 10.0.1388 / Virus Database: 1516/3757 - Release Date: 07/10/11 > > > > References > > > > 1. mailto:weh8@verizon.net > > 2. http://www.utoronto.ca/maclennan > > 3. http://clanmaclochlainn.com/R1b1c7/ > > 4. mailto:DNA-R1B1C7-request@rootsweb.com > > 5. http://clanmaclochlainn.com/R1b1c7/ > > 6. mailto:DNA-R1B1C7-request@rootsweb.com > > 7. http://clanmaclochlainn.com/R1b1c7/ > > 8. mailto:DNA-R1B1C7-request@rootsweb.com > > 9. http://www.avg.com/ > > R1b1c7 Research and Links: > > > > http://clanmaclochlainn.com/R1b1c7/ > > ------------------------------- > > To unsubscribe from the list, please send an email to > DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the > quotes in the subject and the body of the message > > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to > DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the > quotes in the subject and the body of the message >

    07/10/2011 05:50:31
    1. Re: [R-M222] M222 Tree
    2. Bill Howard
    3. Thanks, John. I have only this to add -- the major difference between the correlation method and standard genetic distance is at least twofold. The RCC time scale can be calibrated more easily and GD is only an indication that somewhere in a haplotype there has been a marker change, whereas the RCC method says the same thing more precisely because it looks at the entire marker string. That's a big difference. - Bye from Bill Howard On Jul 10, 2011, at 10:24 PM, Lochlan@aol.com wrote: > In a message dated 7/10/2011 1:21:47 P.M. Central Daylight Time, > alexanderpatterson@btinternet.com writes: > > I have explained to Bill I think 3 times, that I don't work in Excel > and seldom use spreadsheets. Mostly, I write my own software in one of 3 > programming languages, depending on the application, and I use files, not > spreadsheets, so there is no spreadsheet to send him. > > I don't think Bill ever really explained this to the list in detail > (although it is in his articles) but he uses the data analysis tool in older > versions of Excel to generate the CC (correlation coefficient) half matrix he > uses. I've tried that myself and it instantly compares every sample to > every other sample in the spreadsheet and in seconds spits out the CCs in a > half matrix form which should be familiar to anyone who has used the McGee > utility for genetic distance since the format identical. To build a full > matrix you need to do some copying and pasting or use some other Excel trick. > Then you need to convert every CC in the matrix to RCC as you describe. > You wind up with something exactly like the McGee genetic distance matrix > where every sample is compared to the others in one cell except you have RCC > numbers rather than genetic distance. > > I do not think this data analysis tool is available in newer versions of > Excel. In the older version I used (MS. Excel 2000) it was an add-on which > had to be installed from the CD. I just checked my current version of > Excel and it does list data tool kit add-on which includes correlation but I'm > not sure if I'm getting it installed correctly or not. It also lists a > VBA based data tool kit add-on. > > I have no idea if that could be duplicated in software. If the McGee > utility can generate a full matrix then you probably can to. > > I rarely use Excel myself and find the process slow and tedious, mainly > because of the learning curve involved in using Excel itself. It's not my > cup of tea. I'd rather have a software program generate the entire matrix > and conversions. > > I simply don't see much difference between Bill's correlation method and > standard genetic distance. > > <Finally, about the association of genetic distance (GD) with RCC -- I > have run many strings of haplotypes and have changed various marker values by > 1, 2, 3, and compared many sets with each other. They show that a change of > 1 in GD can cause a change in RCC of about 3, depending on which marker > (low vs high) is changed. Table 1 in my published paper in the JoGG > _http://mysite.verizon.net/weh8/Howard1.pdf> confirms those more extensive > calculations. I stand by this association of GD with RCC and maintain that RCC > contains more valuable information because it applies to every marker value, not > just citing how many of them have changed_ > (http://mysite.verizon.net/weh8/Howard1.pdf> confirms those more extensive calculations. I stand by this > association of GD with RCC and maintain that RCC contains more valuable > information because it applies to every marker value, not just citing how many of > them have changed) . > > I've done that myself. If a marker (it doesn't make any difference which > one) with the value of 12 is altered to 13 you will always get the same > CC. Change another marker with a value of 29 to 30 and you will get a > different CC. In genetic distance computations the result would be two (in the > above example) but it would be something slightly different in the > correlation approach. A marker change from 39 to 40 would be yet a different CC > value. I haven't been able to figure out yet exactly what the corrrelation > approach is doing mathematically yet. I'm not sure what difference > applying correlation to every marker makes in comparison to genetic distance > since most of the markers will be the same in any case. Correlation will also > only note the changes. > > As a check on the correlation efficient approach I ran the same samples > Bill is using through one of the Phylip suite of programs called Kitsche, > which uses genetic distance data generated by the McGee utility. A freeware > program called Mega then generates the charts. Info on how to use these > programs can be found on the McGee utility site. There are lots of > variables that can be set on the McGee utility, some of which I thought were > debatable. The instructions include using the infinite allele mutation model, > setting the probability to 95%, years=25 years/generation, mutation rate = > FTDNA = 0.004..0.0075. > > I'm going to have to re-run this because I omitted some samples used in > a tree produced by Bill with Mathematica. But the resulting tree showed > basically the same thing for the McGoverns and Howles, two surnames Bill > has been talking about lately. They are clustered tightly together in both > systems. The Mega program however just gives a short time scale at the > bottom of the chart. On this particular tree it's 200 years. > > All things being equal, it appears Bill's methods may allow for a more > accurate reading on TMRCA. Extrapolating from the 200 year scale on the Mega > chart is difficult by eye and doesn't appear to go beyond 1000 years for > any sample in the spreadsheet. > > I've never been a fan of just using genetic distance alone in DNA > analysis. I know John McEwan used it often. He too came up with phylogenetic > charts for M222 which are still available on his web site. But he also used > modals and in fact developed one for each of his R1bSTR clusters. The > reason I distrust genetic distance alone is you can get false positives, > matches that on closer inspection aren't really matches. > Samples at a GD of 5 tell you nothing about which markers are different. > > I've also used Fluxus charts which take the opposite approach, finding > links between shared marker values in haplotypes. That is almost impossible > to use in huge data sets though. Someone sent me one for M222 a few years > ago and it was an indecipherable mess. > > At this stage I'm not sold on any one approach. But that's just my > opinion. Everyone else is entitled to their own. > > > > > John >

    07/10/2011 05:08:33
    1. Re: [R-M222] M222+ vs M222-
    2. Bill Howard
    3. David, I appreciate what you said. Having developed the RCC correlation technique, I was surprised to see its applicability to genetics to be very important, and its application to genealogy to be less important than I anticipated. In the case of M222, where nobody knows where it originated (arguments are pro and con for Ireland and Scotland) and people even argue about what language the progenitor spoke (!), and whether the Niall myth intersects with the SNP, or originated with Niall, etc., I find myself in a thicket where I have an application that may well provide some insight into the situation yet I find myself deep in arguments about whether the same haplotypes that are SNP tested as M222 are the same as virtually identical haplotypes that pass identity tests for haplotype similarity. Those arguments are coming from people who have not taken the time or made the effort to read my papers or read my FAQs, or don't care to. For my purposes, it doesn't really matter. I think many of the points raised look like the old philosophical arguments about how many angels can dance on the head of a pin (grin!). sorry — my sense of humor is getting the best of me here. Suffice it to say that I have an application that can take haplotypes, in and out of M222, and I can estimate a TMRCA with them. I can't tell what language the progenitor spoke; I cannot tell his origin although heuristic arguments may well point to a rather narrow region but with much uncertainty; and I can't tell what he had for breakfast like we can for the famous Alpine Iceman. Interestingly, the time of the MRCA of all the haplotypes, M22+ or -, is not in the genealogical time frame, so most of the arguments above are irrelevant, if not moot to those interested in the genealogical time scale. You say that there may be trouble conceptualizing why I insist (I prefer 'state') that I can derive useful information from "unrelated" haplotype strings. There is no such thing as an unrelated haplotype string. Here again, people who have that trouble have not digested my two JoGG papers or the content of my FAQ listing. There I show that I can reproduce the ISOGG sequence fairly satisfactorily and I indicate that there is no evidence that the RCC time scale cannot be used tens of thousands of years in the past -- for finding TMRCAs of representative haplogroups of haplotypes. You keep using the phrase "unrelated data sets". That is a meaningless term unless you define it more precisely. If you are comparing haplotypes, they will always be related in ways that are closer the more recent their MRCA and farther apart the more distant their MRCA. In this sense, you cannot use the term "unrelated". I used it in my first JoGG paper much to my dismay, but I was more naive then. In Table 1 of my first paper I use the term 'probably not unrelated' when RCCs of 40 or more are encountered. That should be read as such by a genealogist, but not by a geneticist, for reasons I just mentioned. Again, I reiterate that we can estimate the TMRCAs of any pair, but we must be aware of the time uncertainties that are involved, and that requires a computation, or at least a well-founded estimate, of the standard deviation of the time we find. Hope that helps, David. - Bye from Bill On Jul 10, 2011, at 8:33 PM, J David Grierson wrote: > > Bill, > I'm sorry to buy into this discussion, but I think it is incredibly > important that we all, including the tyros in this (amateur) business, > understand what we mean. > In the ten years of so that we enthusiasts have been dealing with DNA > genealogy, a certain jargon has grown up. The professionals in the business > have developed more and more complicated ways of describing haplotypes and > haplogroups, using alphanumeric strings. To simplify things, we amateurs use > shorthand, such as M222 (which is the code for a certain clade in the > greater R haplogroup, but is commonly referred to as a haplogroup itself) > because if saves a lot of fiddling referencing when communicating. Included > in the jargon is the meaning of M222+, which means that a testee has been > shown positively to carry the M222 mutation. M222-, conversely, means that a > testee has been shown positively NOT to carry the mutation, and is most > commonly attached to a member of the ancestral mutation (L21 as far as we > know). Rarely, it might be attached to the member of a more remotely > connected clade of the "R" haplogroup, who has taken the test for whatever > reason. We can further say that if the haplotype itself does not give us a > degree of certainty (see below), and the bearer has not taken the SNP test, > then technically the haplogroup (or the SNP) is "unknown". > If the relationship was even more remote, ie, if a member of an unconnected > (except in primeval terms) haplogroup took the test, under current > assumptions that would be a waste of effort, and the TMRCA would be in the > tens of thousands of years. Now, we DNA genealogists are predominantly > interested in the most recent millennium, because that encompasses the time > of the use of surnames, and, generally speaking, the period of usable record > keeping. However, that is not to say that we don't also have an interest in > estimating the age of our haplogroup. But we mostly don't have a particular > interest in estimating the relative distance between unconnected > haplogroups. Hence our insistence that knowledge of haplogroup is essential; > indeed, I have trouble conceptualizing why you insist that you can derive > useful information from unrelated haplotype strings. > Now you said "If the two haplotype strings are statistically the same, I > don't really care. They lead to the same dates." > There is a very good reason for this. The two sets of data you were given > were, with a very low probability of error, all carriers of the M222 > mutation, (theoretically M222+), even though untested. That's why they are > statistically the same, and lead to the same dates. It is true that more > than half had not been tested positively for M222. However, the > characteristic haplotype for the bearers of this mutation is such that M222+ > can be predicted quite reliably. This turns on the following particular DYS > values: in the FTDNA markers 1-67, DYS385b=13, 392=14, 448=18, 449=30, > YCAIIb=23, 607=16, 413a=21, 534=16, 481=25. Latterly I have further > identified DYS710=35, 714=24, 549=12, and 513=13 in the 68-111 marker range. > Because M222 is a "young" SNP mutation, there haven't been many random DYS > mutations since then, so most M222 members carry the great majority of the > above DYS values. > So, in our various ways, I think we are trying to say to you that using this > particular data, you shouldn't draw conclusions about unrelated data sets. > These are related. I accept your statistical expertise, but ask, what > knowledge do we gain by comparing unrelated data sets, say, members of > Haplogroups defined by M222, L21, I1 and G2a? I ask because they are the > haplogroups identified in my surname study. We already know that they > divided one from the other during the last 40,000 years, but unless we are > trying to define an individual's place in all of this, what is to be gained? > Regards > David Grierson > On 11/07/2011 8:25 AM, Bill Howard wrote: > > Paul, > If the two haplotype strings are statistically the same, I don't really care. T > hey lead to the same dates. > I agree, we are now beating on a dead horse. > I am sorry you think I am tiresome but you don't appear to understand that the > date of origin depends only on the haplotypes presented to the program, not whe > ther or not it is a member of a particular SNP. > (In the two postings immediately below, I find that only you used the word "unk > nown")….. > - Bye from Bill Howard > > On Jul 10, 2011, at 6:11 PM, Paul Conroy wrote: > > Bill, > > Once again M222- does NOT mean untested, it mean (sic) TESTED NEGATIVE. > > Unknown means untested. > > You're getting tiresome. > > > On 7/10/11, Bill Howard [1]<weh8@verizon.net> wrote: > > Hi, David, > > I did see your posting and I apologize for being a bit tardy in my reply. > > I got into this when a friend suggested looking into the M222 SNP and to see > if there is a connection between it and Niall and his descendants. My look > at the situation indicates that, while Niall and the UiNeills may have > carried the SNP, it cannot be proved that they did so. My date determination > (see below) indicates that the SNP did not originate with them. > > In the process I became aware that one of the things that the DNA folks > > wanted to do was to try to date the origin of the M222 SNP. Since my RCC > approach could do that estimate, I wanted to analyze haplotypes that were in > the M222 family. > To prepare for the analysis, I was given a large list of M222 folks, and > later found that only some of them had been SNP tested. I found that only > slightly in excess of 320 had actually been tested, so I collected them as a > second database. > > Next, there was a list exchange that suggested that the M222 group should be > separated into plus and minus groupings, with minus not being well-defined > except that they had not been tested. Before that exchange I tried to see > if I could separate the plusses and the minuses by their haplotypes alone, > and I found that they were statistically the same. If there was a separation > by SNP testing they certainly did not stand out as being separate from their > haplotypes. That analysis has already been posted. > > Now, since they looked to be the same, I separated my analysis into the two > databases, the ones that had been called M222, a mixture of those tested and > untested, and only those that had been tested. I ran a TMRCA for both groups > and found that the answers were the same within the estimated error of about > 300 years SD. > > It is a bit premature at this stage to give the answer I got since it has > not been fully discussed with my potential co-author, but it was > considerably earlier than Niall and was more like the dates that John McEwan > got in the BC era. More on this later. > > To address your question about how I can calculate a time for the mixture, I > say that if I cannot distinguish the difference from the haplotypes and > > since Mathematica works only on those haplotypes (without any knowledge of > which group it is being given to analyze), I should get the same answer if I > use either the large or the small sample. And that's what I got, again > within the uncertainty of the errors involved. The answer for the M222 plus > sample is statistically the same as the answer from the larger database. > That's because the haplotypes inputted to Mathematica in the two samples > were statistically the same. So, if you want the answer to dating M222 plus > alone, it is the same date. I think that my analysis has been professionally > rigorous given the statistical equalities within the two databases. I hope > this answers your questions, David. > > - Bye from Bill Howard > > > > On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: > > Dear Bill, > Yesterday I posted a note concerning the M222 SNP status of your data > (see below), but you have not responded. Can you please comment on what I > said. I am particularly concerned about your dating of the time of the > > M222 > mutation. If you are looking at samples of M222+ that are mixed with > M222-, > how can you calculate a time of the mutation? > David > > Dear Bill, > As a biological scientist I find it distressing that you and others are > trying to convince us that it doesn't really matter if your SNP test does > or > does not show that you are M222+, you can still be included in the M222 > project on the basis of your STR haplotype. Data based on such an > assumption > would not be acceptable in a rigorous scientific journal. > It would seem to me that the benchmark of the M222 project should be > > the > presence of M222+. At some stage in our background two brothers may have > had > an identical or nearly identical STR haplotype, but brother one had a de > novo mutation that created the M222 SNP and brother two did not. The > descendants of brother one would be M222+ and the descendants of brother > two > would be M222-. This de novo mutation occurred at a specific date and we > would all be very interested in that date. However, if the samples used to > measure that date are a mixture of = and - SNPs, then you can't measure > the > date of appearance of M222 accurately because common STR haplotypes would > predate the appearance of the M222 SNP. > Let's focus on the rigor of the analysis, not the cost of SNP testing. > David > > -- > Dr. David H. MacLennan, > Banting and Best Department of Medical Research, > University of Toronto, Charles H. Best Institute, > 112 College St., Toronto, Ontario, Canada M5G1L6 > Tel:1-416-978-5008 Fax:1-416-978-8528 > [2]http://www.utoronto.ca/maclennan > > > > R1b1c7 Research and Links: > > [3]http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to > [4]DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the > > quotes in the subject and the body of the message > > -- > Sent from my mobile device > R1b1c7 Research and Links: > > [5]http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to [6]DNA-R1B1C7-request@roo > tsweb.com with the word 'unsubscribe' without the quotes in the subject and the > body of the message > > > R1b1c7 Research and Links: > > [7]http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to [8]DNA-R1B1C7-request@roo > tsweb.com with the word 'unsubscribe' without the quotes in the subject and the > body of the message > > > ----- > No virus found in this message. > Checked by AVG - [9]www.avg.com > Version: 10.0.1388 / Virus Database: 1516/3757 - Release Date: 07/10/11 > > References > > 1. mailto:weh8@verizon.net > 2. http://www.utoronto.ca/maclennan > 3. http://clanmaclochlainn.com/R1b1c7/ > 4. mailto:DNA-R1B1C7-request@rootsweb.com > 5. http://clanmaclochlainn.com/R1b1c7/ > 6. mailto:DNA-R1B1C7-request@rootsweb.com > 7. http://clanmaclochlainn.com/R1b1c7/ > 8. mailto:DNA-R1B1C7-request@rootsweb.com > 9. http://www.avg.com/ > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 04:56:28
    1. Re: [R-M222] M222 Tree
    2. In a message dated 7/10/2011 1:21:47 P.M. Central Daylight Time, alexanderpatterson@btinternet.com writes: I have explained to bill Bill I think 3 times, that I don't work in Excel and seldom use spreadsheets. Mostly, I write my own software in one of 3 programming languages, depending on the application, and I use files, not spreadsheets, so there is no spreadsheet to send him. I don't think Bill ever really explained this to the list in detail (although it is in his articles) but he uses the data analysis tool in older versions of Excel to generate the CC (correlation coefficient) half matrix he uses. I've tried that myself and it instantly compares every sample to every other sample in the spreadsheet and in seconds spits out the CCs in a half matrix form which should be familiar to anyone who has used the McGee utility for genetic distance since the format identical. To build a full matrix you need to do some copying and pasting or use some other Excel trick. Then you need to convert every CC in the matrix to RCC as you describe. You wind up with something exactly like the McGee genetic distance matrix where every sample is compared to the others in one cell except you have RCC numbers rather than genetic distance. I do not think this data analysis tool is available in newer versions of Excel. In the older version I used (MS. Excel 2000) it was an add-on which had to be installed from the CD. I just checked my current version of Excel and it does list data tool kit add-on which includes correlation but I'm not sure if I'm getting it installed correctly or not. It also lists a VBA based data tool kit add-on. I have no idea if that could be duplicated in software. If the McGee utility can generate a full matrix then you probably can to. I rarely use Excel myself and find the process slow and tedious, mainly because of the learning curve involved in using Excel itself. It's not my cup of tea. I'd rather have a software program generate the entire matrix and conversions. I simply don't see much difference between Bill's correlation method and standard genetic distance. <Finally, about the association of genetic distance (GD) with RCC -- I have run many strings of haplotypes and have changed various marker values by 1, 2, 3, and compared many sets with each other. They show that a change of 1 in GD can cause a change in RCC of about 3, depending on which marker (low vs high) is changed. Table 1 in my published paper in the JoGG _http://mysite.verizon.net/weh8/Howard1.pdf> confirms those more extensive calculations. I stand by this association of GD with RCC and maintain that RCC contains more valuable information because it applies to every marker value, not just citing how many of them have changed_ (http://mysite.verizon.net/weh8/Howard1.pdf> confirms those more extensive calculations. I stand by this association of GD with RCC and maintain that RCC contains more valuable information because it applies to every marker value, not just citing how many of them have changed) . I've done that myself. If a marker (it doesn't make any difference which one) with the value of 12 is altered to 13 you will always get the same CC. Change another marker with a value of 29 to 30 and you will get a different CC. In genetic distance computations the result would be two (in the above example) but it would be something slightly different in the correlation approach. A marker change from 39 to 40 would be yet a different CC value. I haven't been able to figure out yet exactly what the corrrelation approach is doing mathematically yet. I'm not sure what difference applying correlation to every marker makes in comparison to genetic distance since most of the markers will be the same in any case. Correlation will also only note the changes. As a check on the correlation efficient approach I ran the same samples Bill is using through one of the Phylip suite of programs called Kitsche, which uses genetic distance data generated by the McGee utility. A freeware program called Mega then generates the charts. Info on how to use these programs can be found on the McGee utility site. There are lots of variables that can be set on the McGee utility, some of which I thought were debatable. The instructions include using the infinite allele mutation model, setting the probability to 95%, years=25 years/generation, mutation rate = FTDNA = 0.004..0.0075. I'm going to have to re-run this because I omitted some samples used in a tree produced by Bill with Mathematica. But the resulting tree showed basically the same thing for the McGoverns and Howles, two surnames Bill has been talking about lately. They are clustered tightly together in both systems. The Mega program however just gives a short time scale at the bottom of the chart. On this particular tree it's 200 years. All things being equal, it appears Bill's methods may allow for a more accurate reading on TMRCA. Extrapolating from the 200 year scale on the Mega chart is difficult by eye and doesn't appear to go beyond 1000 years for any sample in the spreadsheet. I've never been a fan of just using genetic distance alone in DNA analysis. I know John McEwan used it often. He too came up with phylogenetic charts for M222 which are still available on his web site. But he also used modals and in fact developed one for each of his R1bSTR clusters. The reason I distrust genetic distance alone is you can get false positives, matches that on closer inspection aren't really matches. Samples at a GD of 5 tell you nothing about which markers are different. I've also used Fluxus charts which take the opposite approach, finding links between shared marker values in haplotypes. That is almost impossible to use in huge data sets though. Someone sent me one for M222 a few years ago and it was an indecipherable mess. At this stage I'm not sold on any one approach. But that's just my opinion. Everyone else is entitled to their own. John

    07/10/2011 04:24:23
    1. Re: [R-M222] M222 Tree
    2. Marie Kerr
    3. Thanks, Bill. As a non-statistician but an ardent scientific type , I thoroughly agree with Bill's approach/comments: . Almost everyone uses Excel, so why use proprietary (aka home-grown) software? That just muddies the understanding of results. . Why use just two families? The 111 results must be pouring in, providing rich data for multiple lines of inquiry. . If you have a theory, please posit it in the scientific method and support/not support it with rich and voluminous data. . And most of all: Please try to explain all of the above in non-academic terms. One of Bryan Sykes' gifts has been to do that, and anyone who obfuscates (or does not try to illuminate for the masses) is not serving the greater good. My father's 111 results are almost ready, and I would love to have someone explain-for the sake of newcomers/non-scientists/non-academicians-exactly what the significance is, for example, a 111 -9 match compared to a 67 -3 match. As one who is interested in finding relatives/ancestral lines, these are some of the things that I'm most interested in. I'm not, however, interested in the minutiae of "ONE Conroy and ONE Ewing" and abstruse musings. Thanks, Marie Golden Kerr, on behalf of my father, James J. Golden, a 1st generation Irish-American from Rathlacken, County Mayo, Ireland -----Original Message----- From: dna-r1b1c7-bounces@rootsweb.com [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Bill Howard Sent: Sunday, July 10, 2011 3:02 PM To: dna-r1b1c7@rootsweb.com Subject: Re: [R-M222] M222 Tree Sandy, You have obviously not read my FAQs, and I certainly suggest that you do so before doing postings like this one. My criticism still stands. I don't know how you did it. Off line, you said that you would send me an Excel spreadsheet if I could not open yours. I have told you that I could not open the file you sent, and you have not done it. I realize that you don't work in Excel, but it should be easy to make a spreadsheet that can be converted to Excel. Most people can do this since Excel is part of a Microsoft family of applications that are perhaps the widest used around the world. I completely agree that if you are comparing only one Ewing and one Conroy haplotype, you can easily get a difference like the one you cite. But you jump to an unwarranted conclusion about it. A careful reading of my FAQs would indicate that the errors on individual pairs of values can be quite large. What you point out is that the RCCs of two individual test results differ by 21 in RCC when their 37 and 67 markers are compared. That is certainly NOT the same as saying that the overall RCC time scale is different. To me it is not at all disturbing. It merely shows that: . You have based a conclusion on only one anecdotal piece of evidence. A statistician would never do that. . You are comparing only one 37 marker result with only one 67 marker result. A statistician would never do that. . When you correlate 37 markers separately from 67 markers, the results are bound to differ, since the input is different. But, my contention is that there is NO evidence that, over a large set of examples, they will differ significantly. . You are ignoring what I have already said about the 37 and 67 marker trees; they will differ in detail. But not in their general form. . And, of course one anecdotal change between ONE Conroy and ONE Ewing will be more dramatic. In short, a difference of 6 o 7 is in the noise. As I wrote before, you should compare all the haplotypes, not just the Ewings with the rest. You are not doing the analysis as you should be doing. I will make the same suggestions to you that I have done before: . , do it right, . do it so that I can check it, . pay attention to statistics, . compute the SDs of your statements, and, . read my FAQs which are located at: < <http://mysite.verizon.net/weh8/FAQ.pdf> http://mysite.verizon.net/weh8/FAQ.pdf> It doesn't matter how many programming languages you know how to use; it does matter that it is done: . thoroughly, . correctly, and, . in a way that is reproduceable, and, . it can be explained. You have not done that. Until that is done, and until your work is checked and is found to be reproduceable, as all scientific studies must be, I cannot agree with what you have done or the way that you did it. I just want to add that, when we first began an off-line correspondence I suggested that you send me your data and explain your approach, and you backed off. I think you are continuing to do so. - Bye from Bill Howard On Jul 10, 2011, at 2:17 PM, Sandy Paterson wrote: > I have explained to bill (sic!) Bill I think 3 times, that I don't > work in Excel and seldom use spreadsheets. Mostly, I write my own > software in one of 3 programming languages, depending on the > application, and I use files, not spreadsheets, so there is no spreadsheet to send him. > > He doesn't seem to understand that. > > Still, I think what we are discussing is of sufficient importance to > warrant some kind of resolution, so what I'll do is to give a specific > example of what I'm getting at. The example I've chosen is a > comparison between Ewing > 26605 and Conroy 16646, over 37 markers and over 67 markers. > > Correlation coef. over 37 markers 0.990382002396377 > Correlation coef. Over 67 markers 0.992411264610795 > > RCC over 37 markers = (1/.990382002396377 - 1)*10000 = 97.11 RCC > over 67 markers = (1/.992411264610795 - 1)*10000 = 76.47 > > I did this comparing all 19 Ewings in my files to all non-Ewings in my > files and found what I believe is a disturbing result. The mean RCC > falls by 6.71 in changing from 37 to 67 markers. Not a single RCC > increases; they all decrease. > > Clearly the change in RCC between Conroy and Ewing is far more dramatic. > > > Anyone who wishes can check the above by doing the following : > > 1.Place the Ewing 26605 marker values in column A in an Excel spreadsheet. > 2.Place the Conroy 16646 marker values in column B in the same spreadsheet. > 3.Use the CORREL function in Excel, denoting the arrays as A1:A37 and > B1:B37 (changing the 37 to 67 if appropriate). > > The RCC's are calculated by taking the reciprocal of the correlation > coefficient, subtracting 1 then multiplying by 10000, as illustrated above. > > > Sandy > > > > > > -----Original Message----- > From: dna-r1b1c7-bounces@rootsweb.com > [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Bill Howard > Sent: 10 July 2011 15:54 > To: dna-r1b1c7@rootsweb.com > Subject: Re: [R-M222] M222 Tree > > Sandy, > > I have suggested a number of times that you should send me your > spreadsheet where you have the details of comparisons like this one > and you have not done so. > > > R1b1c7 Research and Links: > > <http://clanmaclochlainn.com/R1b1c7/> http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to > <mailto:DNA-R1B1C7-request@rootsweb.com> DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message R1b1c7 Research and Links: <http://clanmaclochlainn.com/R1b1c7/> http://clanmaclochlainn.com/R1b1c7/ ------------------------------- To unsubscribe from the list, please send an email to <mailto:DNA-R1B1C7-request@rootsweb.com> DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 02:00:44
    1. Re: [R-M222] M222 Tree
    2. Sandy Paterson
    3. I have explained to bill Bill I think 3 times, that I don't work in Excel and seldom use spreadsheets. Mostly, I write my own software in one of 3 programming languages, depending on the application, and I use files, not spreadsheets, so there is no spreadsheet to send him. He doesn't seem to understand that. Still, I think what we are discussing is of sufficient importance to warrant some kind of resolution, so what I'll do is to give a specific example of what I'm getting at. The example I've chosen is a comparison between Ewing 26605 and Conroy 16646, over 37 markers and over 67 markers. Correlation coef. over 37 markers 0.990382002396377 Correlation coef. Over 67 markers 0.992411264610795 RCC over 37 markers = (1/.990382002396377 - 1)*10000 = 97.11 RCC over 67 markers = (1/.992411264610795 - 1)*10000 = 76.47 I did this comparing all 19 Ewings in my files to all non-Ewings in my files and found what I believe is a disturbing result. The mean RCC falls by 6.71 in changing from 37 to 67 markers. Not a single RCC increases; they all decrease. Clearly the change in RCC between Conroy and Ewing is far more dramatic. Anyone who wishes can check the above by doing the following : 1.Place the Ewing 26605 marker values in column A in an Excel spreadsheet. 2.Place the Conroy 16646 marker values in column B in the same spreadsheet. 3.Use the CORREL function in Excel, denoting the arrays as A1:A37 and B1:B37 (changing the 37 to 67 if appropriate). The RCC's are calculated by taking the reciprocal of the correlation coefficient, subtracting 1 then multiplying by 10000, as illustrated above. Sandy -----Original Message----- From: dna-r1b1c7-bounces@rootsweb.com [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Bill Howard Sent: 10 July 2011 15:54 To: dna-r1b1c7@rootsweb.com Subject: Re: [R-M222] M222 Tree Sandy, I have suggested a number of times that you should send me your spreadsheet where you have the details of comparisons like this one and you have not done so.

    07/10/2011 01:17:27
    1. Re: [R-M222] M222+ vs M222-
    2. Bill Howard
    3. Paul, If the two haplotype strings are statistically the same, I don't really care. They lead to the same dates. I agree, we are now beating on a dead horse. I am sorry you think I am tiresome but you don't appear to understand that the date of origin depends only on the haplotypes presented to the program, not whether or not it is a member of a particular SNP. (In the two postings immediately below, I find that only you used the word "unknown")….. - Bye from Bill Howard On Jul 10, 2011, at 6:11 PM, Paul Conroy wrote: > Bill, > > Once again M222- does NOT mean untested, it mean (sic) TESTED NEGATIVE. > > Unknown means untested. > > You're getting tiresome. > > > On 7/10/11, Bill Howard <weh8@verizon.net> wrote: >> Hi, David, >> >> I did see your posting and I apologize for being a bit tardy in my reply. >> >> I got into this when a friend suggested looking into the M222 SNP and to see >> if there is a connection between it and Niall and his descendants. My look >> at the situation indicates that, while Niall and the UiNeills may have >> carried the SNP, it cannot be proved that they did so. My date determination >> (see below) indicates that the SNP did not originate with them. >> >> In the process I became aware that one of the things that the DNA folks >> wanted to do was to try to date the origin of the M222 SNP. Since my RCC >> approach could do that estimate, I wanted to analyze haplotypes that were in >> the M222 family. >> To prepare for the analysis, I was given a large list of M222 folks, and >> later found that only some of them had been SNP tested. I found that only >> slightly in excess of 320 had actually been tested, so I collected them as a >> second database. >> >> Next, there was a list exchange that suggested that the M222 group should be >> separated into plus and minus groupings, with minus not being well-defined >> except that they had not been tested. Before that exchange I tried to see >> if I could separate the plusses and the minuses by their haplotypes alone, >> and I found that they were statistically the same. If there was a separation >> by SNP testing they certainly did not stand out as being separate from their >> haplotypes. That analysis has already been posted. >> >> Now, since they looked to be the same, I separated my analysis into the two >> databases, the ones that had been called M222, a mixture of those tested and >> untested, and only those that had been tested. I ran a TMRCA for both groups >> and found that the answers were the same within the estimated error of about >> 300 years SD. >> >> It is a bit premature at this stage to give the answer I got since it has >> not been fully discussed with my potential co-author, but it was >> considerably earlier than Niall and was more like the dates that John McEwan >> got in the BC era. More on this later. >> >> To address your question about how I can calculate a time for the mixture, I >> say that if I cannot distinguish the difference from the haplotypes and >> since Mathematica works only on those haplotypes (without any knowledge of >> which group it is being given to analyze), I should get the same answer if I >> use either the large or the small sample. And that's what I got, again >> within the uncertainty of the errors involved. The answer for the M222 plus >> sample is statistically the same as the answer from the larger database. >> That's because the haplotypes inputted to Mathematica in the two samples >> were statistically the same. So, if you want the answer to dating M222 plus >> alone, it is the same date. I think that my analysis has been professionally >> rigorous given the statistical equalities within the two databases. I hope >> this answers your questions, David. >> >> - Bye from Bill Howard >> >> >> >> On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: >> >>> Dear Bill, >>> Yesterday I posted a note concerning the M222 SNP status of your data >>> (see below), but you have not responded. Can you please comment on what I >>> said. I am particularly concerned about your dating of the time of the >>> M222 >>> mutation. If you are looking at samples of M222+ that are mixed with >>> M222-, >>> how can you calculate a time of the mutation? >>> David >>> >>> Dear Bill, >>> As a biological scientist I find it distressing that you and others are >>> trying to convince us that it doesn't really matter if your SNP test does >>> or >>> does not show that you are M222+, you can still be included in the M222 >>> project on the basis of your STR haplotype. Data based on such an >>> assumption >>> would not be acceptable in a rigorous scientific journal. >>> It would seem to me that the benchmark of the M222 project should be >>> the >>> presence of M222+. At some stage in our background two brothers may have >>> had >>> an identical or nearly identical STR haplotype, but brother one had a de >>> novo mutation that created the M222 SNP and brother two did not. The >>> descendants of brother one would be M222+ and the descendants of brother >>> two >>> would be M222-. This de novo mutation occurred at a specific date and we >>> would all be very interested in that date. However, if the samples used to >>> measure that date are a mixture of = and - SNPs, then you can't measure >>> the >>> date of appearance of M222 accurately because common STR haplotypes would >>> predate the appearance of the M222 SNP. >>> Let's focus on the rigor of the analysis, not the cost of SNP testing. >>> David >>> >>> -- >>> Dr. David H. MacLennan, >>> Banting and Best Department of Medical Research, >>> University of Toronto, Charles H. Best Institute, >>> 112 College St., Toronto, Ontario, Canada M5G1L6 >>> Tel:1-416-978-5008 Fax:1-416-978-8528 >>> http://www.utoronto.ca/maclennan >>> >>> >> >> >> R1b1c7 Research and Links: >> >> http://clanmaclochlainn.com/R1b1c7/ >> ------------------------------- >> To unsubscribe from the list, please send an email to >> DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the >> quotes in the subject and the body of the message >> > > -- > Sent from my mobile device > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 12:25:50
    1. Re: [R-M222] M222+ vs M222-
    2. Paul Conroy
    3. Bill, Once again M222- does NOT mean untested, it mean TESTED NEGATIVE. Unknown means untested. You're getting tiresome. On 7/10/11, Bill Howard <weh8@verizon.net> wrote: > Hi, David, > > I did see your posting and I apologize for being a bit tardy in my reply. > > I got into this when a friend suggested looking into the M222 SNP and to see > if there is a connection between it and Niall and his descendants. My look > at the situation indicates that, while Niall and the UiNeills may have > carried the SNP, it cannot be proved that they did so. My date determination > (see below) indicates that the SNP did not originate with them. > > In the process I became aware that one of the things that the DNA folks > wanted to do was to try to date the origin of the M222 SNP. Since my RCC > approach could do that estimate, I wanted to analyze haplotypes that were in > the M222 family. > To prepare for the analysis, I was given a large list of M222 folks, and > later found that only some of them had been SNP tested. I found that only > slightly in excess of 320 had actually been tested, so I collected them as a > second database. > > Next, there was a list exchange that suggested that the M222 group should be > separated into plus and minus groupings, with minus not being well-defined > except that they had not been tested. Before that exchange I tried to see > if I could separate the plusses and the minuses by their haplotypes alone, > and I found that they were statistically the same. If there was a separation > by SNP testing they certainly did not stand out as being separate from their > haplotypes. That analysis has already been posted. > > Now, since they looked to be the same, I separated my analysis into the two > databases, the ones that had been called M222, a mixture of those tested and > untested, and only those that had been tested. I ran a TMRCA for both groups > and found that the answers were the same within the estimated error of about > 300 years SD. > > It is a bit premature at this stage to give the answer I got since it has > not been fully discussed with my potential co-author, but it was > considerably earlier than Niall and was more like the dates that John McEwan > got in the BC era. More on this later. > > To address your question about how I can calculate a time for the mixture, I > say that if I cannot distinguish the difference from the haplotypes and > since Mathematica works only on those haplotypes (without any knowledge of > which group it is being given to analyze), I should get the same answer if I > use either the large or the small sample. And that's what I got, again > within the uncertainty of the errors involved. The answer for the M222 plus > sample is statistically the same as the answer from the larger database. > That's because the haplotypes inputted to Mathematica in the two samples > were statistically the same. So, if you want the answer to dating M222 plus > alone, it is the same date. I think that my analysis has been professionally > rigorous given the statistical equalities within the two databases. I hope > this answers your questions, David. > > - Bye from Bill Howard > > > > On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: > >> Dear Bill, >> Yesterday I posted a note concerning the M222 SNP status of your data >> (see below), but you have not responded. Can you please comment on what I >> said. I am particularly concerned about your dating of the time of the >> M222 >> mutation. If you are looking at samples of M222+ that are mixed with >> M222-, >> how can you calculate a time of the mutation? >> David >> >> Dear Bill, >> As a biological scientist I find it distressing that you and others are >> trying to convince us that it doesn't really matter if your SNP test does >> or >> does not show that you are M222+, you can still be included in the M222 >> project on the basis of your STR haplotype. Data based on such an >> assumption >> would not be acceptable in a rigorous scientific journal. >> It would seem to me that the benchmark of the M222 project should be >> the >> presence of M222+. At some stage in our background two brothers may have >> had >> an identical or nearly identical STR haplotype, but brother one had a de >> novo mutation that created the M222 SNP and brother two did not. The >> descendants of brother one would be M222+ and the descendants of brother >> two >> would be M222-. This de novo mutation occurred at a specific date and we >> would all be very interested in that date. However, if the samples used to >> measure that date are a mixture of = and - SNPs, then you can't measure >> the >> date of appearance of M222 accurately because common STR haplotypes would >> predate the appearance of the M222 SNP. >> Let's focus on the rigor of the analysis, not the cost of SNP testing. >> David >> >> -- >> Dr. David H. MacLennan, >> Banting and Best Department of Medical Research, >> University of Toronto, Charles H. Best Institute, >> 112 College St., Toronto, Ontario, Canada M5G1L6 >> Tel:1-416-978-5008 Fax:1-416-978-8528 >> http://www.utoronto.ca/maclennan >> >> > > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to > DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the > quotes in the subject and the body of the message > -- Sent from my mobile device

    07/10/2011 12:11:15
    1. Re: [R-M222] M222+ vs M222-
    2. Bill Howard
    3. Hi, David, I did see your posting and I apologize for being a bit tardy in my reply. I got into this when a friend suggested looking into the M222 SNP and to see if there is a connection between it and Niall and his descendants. My look at the situation indicates that, while Niall and the UiNeills may have carried the SNP, it cannot be proved that they did so. My date determination (see below) indicates that the SNP did not originate with them. In the process I became aware that one of the things that the DNA folks wanted to do was to try to date the origin of the M222 SNP. Since my RCC approach could do that estimate, I wanted to analyze haplotypes that were in the M222 family. To prepare for the analysis, I was given a large list of M222 folks, and later found that only some of them had been SNP tested. I found that only slightly in excess of 320 had actually been tested, so I collected them as a second database. Next, there was a list exchange that suggested that the M222 group should be separated into plus and minus groupings, with minus not being well-defined except that they had not been tested. Before that exchange I tried to see if I could separate the plusses and the minuses by their haplotypes alone, and I found that they were statistically the same. If there was a separation by SNP testing they certainly did not stand out as being separate from their haplotypes. That analysis has already been posted. Now, since they looked to be the same, I separated my analysis into the two databases, the ones that had been called M222, a mixture of those tested and untested, and only those that had been tested. I ran a TMRCA for both groups and found that the answers were the same within the estimated error of about 300 years SD. It is a bit premature at this stage to give the answer I got since it has not been fully discussed with my potential co-author, but it was considerably earlier than Niall and was more like the dates that John McEwan got in the BC era. More on this later. To address your question about how I can calculate a time for the mixture, I say that if I cannot distinguish the difference from the haplotypes and since Mathematica works only on those haplotypes (without any knowledge of which group it is being given to analyze), I should get the same answer if I use either the large or the small sample. And that's what I got, again within the uncertainty of the errors involved. The answer for the M222 plus sample is statistically the same as the answer from the larger database. That's because the haplotypes inputted to Mathematica in the two samples were statistically the same. So, if you want the answer to dating M222 plus alone, it is the same date. I think that my analysis has been professionally rigorous given the statistical equalities within the two databases. I hope this answers your questions, David. - Bye from Bill Howard On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: > Dear Bill, > Yesterday I posted a note concerning the M222 SNP status of your data > (see below), but you have not responded. Can you please comment on what I > said. I am particularly concerned about your dating of the time of the M222 > mutation. If you are looking at samples of M222+ that are mixed with M222-, > how can you calculate a time of the mutation? > David > > Dear Bill, > As a biological scientist I find it distressing that you and others are > trying to convince us that it doesn't really matter if your SNP test does or > does not show that you are M222+, you can still be included in the M222 > project on the basis of your STR haplotype. Data based on such an assumption > would not be acceptable in a rigorous scientific journal. > It would seem to me that the benchmark of the M222 project should be the > presence of M222+. At some stage in our background two brothers may have had > an identical or nearly identical STR haplotype, but brother one had a de > novo mutation that created the M222 SNP and brother two did not. The > descendants of brother one would be M222+ and the descendants of brother two > would be M222-. This de novo mutation occurred at a specific date and we > would all be very interested in that date. However, if the samples used to > measure that date are a mixture of = and - SNPs, then you can't measure the > date of appearance of M222 accurately because common STR haplotypes would > predate the appearance of the M222 SNP. > Let's focus on the rigor of the analysis, not the cost of SNP testing. > David > > -- > Dr. David H. MacLennan, > Banting and Best Department of Medical Research, > University of Toronto, Charles H. Best Institute, > 112 College St., Toronto, Ontario, Canada M5G1L6 > Tel:1-416-978-5008 Fax:1-416-978-8528 > http://www.utoronto.ca/maclennan > >

    07/10/2011 11:18:28
    1. Re: [R-M222] M222 Tree
    2. Bill Howard
    3. Sandy, You have obviously not read my FAQs, and I certainly suggest that you do so before doing postings like this one. My criticism still stands. I don't know how you did it. Off line, you said that you would send me an Excel spreadsheet if I could not open yours. I have told you that I could not open the file you sent, and you have not done it. I realize that you don't work in Excel, but it should be easy to make a spreadsheet that can be converted to Excel. Most people can do this since Excel is part of a Microsoft family of applications that are perhaps the widest used around the world. I completely agree that if you are comparing only one Ewing and one Conroy haplotype, you can easily get a difference like the one you cite. But you jump to an unwarranted conclusion about it. A careful reading of my FAQs would indicate that the errors on individual pairs of values can be quite large. What you point out is that the RCCs of two individual test results differ by 21 in RCC when their 37 and 67 markers are compared. That is certainly NOT the same as saying that the overall RCC time scale is different. To me it is not at all disturbing. It merely shows that: • You have based a conclusion on only one anecdotal piece of evidence. A statistician would never do that. • You are comparing only one 37 marker result with only one 67 marker result. A statistician would never do that. • When you correlate 37 markers separately from 67 markers, the results are bound to differ, since the input is different. But, my contention is that there is NO evidence that, over a large set of examples, they will differ significantly. • You are ignoring what I have already said about the 37 and 67 marker trees; they will differ in detail. But not in their general form. • And, of course one anecdotal change between ONE Conroy and ONE Ewing will be more dramatic. In short, a difference of 6 o 7 is in the noise. As I wrote before, you should compare all the haplotypes, not just the Ewings with the rest. You are not doing the analysis as you should be doing. I will make the same suggestions to you that I have done before: • , do it right, • do it so that I can check it, • pay attention to statistics, • compute the SDs of your statements, and, • read my FAQs which are located at: <http://mysite.verizon.net/weh8/FAQ.pdf> It doesn't matter how many programming languages you know how to use; it does matter that it is done: • thoroughly, • correctly, and, • in a way that is reproduceable, and, • it can be explained. You have not done that. Until that is done, and until your work is checked and is found to be reproduceable, as all scientific studies must be, I cannot agree with what you have done or the way that you did it. I just want to add that, when we first began an off-line correspondence I suggested that you send me your data and explain your approach, and you backed off. I think you are continuing to do so. - Bye from Bill Howard On Jul 10, 2011, at 2:17 PM, Sandy Paterson wrote: > I have explained to bill (sic!) Bill I think 3 times, that I don't work in Excel > and seldom use spreadsheets. Mostly, I write my own software in one of 3 > programming languages, depending on the application, and I use files, not > spreadsheets, so there is no spreadsheet to send him. > > He doesn't seem to understand that. > > Still, I think what we are discussing is of sufficient importance to warrant > some kind of resolution, so what I'll do is to give a specific example of > what I'm getting at. The example I've chosen is a comparison between Ewing > 26605 and Conroy 16646, over 37 markers and over 67 markers. > > Correlation coef. over 37 markers 0.990382002396377 > Correlation coef. Over 67 markers 0.992411264610795 > > RCC over 37 markers = (1/.990382002396377 - 1)*10000 = 97.11 > RCC over 67 markers = (1/.992411264610795 - 1)*10000 = 76.47 > > I did this comparing all 19 Ewings in my files to all non-Ewings in my files > and found what I believe is a disturbing result. The mean RCC falls by 6.71 > in changing from 37 to 67 markers. Not a single RCC increases; they all > decrease. > > Clearly the change in RCC between Conroy and Ewing is far more dramatic. > > > Anyone who wishes can check the above by doing the following : > > 1.Place the Ewing 26605 marker values in column A in an Excel spreadsheet. > 2.Place the Conroy 16646 marker values in column B in the same spreadsheet. > 3.Use the CORREL function in Excel, denoting the arrays as A1:A37 and B1:B37 > (changing the 37 to 67 if appropriate). > > The RCC's are calculated by taking the reciprocal of the correlation > coefficient, subtracting 1 then multiplying by 10000, as illustrated above. > > > Sandy > > > > > > -----Original Message----- > From: dna-r1b1c7-bounces@rootsweb.com > [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Bill Howard > Sent: 10 July 2011 15:54 > To: dna-r1b1c7@rootsweb.com > Subject: Re: [R-M222] M222 Tree > > Sandy, > > I have suggested a number of times that you should send me your spreadsheet > where you have the details of comparisons like this one and you have not > done so. > > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 09:01:35
    1. Re: [R-M222] M222 Tree
    2. Bill Howard
    3. Sandy, I have suggested a number of times that you should send me your spreadsheet where you have the details of comparisons like this one and you have not done so. I have also suggested, off-line, that you approached the earlier haplotype analysis in the wrong way -- the one in which you compared a few haplotypes with only one Ewing. I showed even then that if you compare the differences between your 37, 67 and 111 RCCs, assigned defendable standard deviations to both your comparisons AND the single Ewing comparison haplotype, you get differences among your results that are within about one standard deviation of the average of the three runs. My calculation shows that, using your own data, your original conclusion, claiming that they were different, was incorrect. Then I suggested that you compare each haplotype string with each of ALL the other haplotype strings -- an easy job if you use a correlation app in any statistical application. You did this, apparently, in your note below but only with a few more Ewings. That was not what I suggested. You admitted that your original approach was wrong, and I proved, using your own figures, that if you assign reasonable SDs to the data, that the three results are statistically the same. Now, you did not take my suggestion to do a comparison over all the haplogroups, but compared them with a few more Ewings. That is still not the right thing to do. You should be comparing all the haplotypes with each other. 19 Ewings are not enough. You did not calculate any value for the standard deviations of the result you got. Few statisticians would say that the numbers are different without an assignment of SDs to the differences you found. Again, I challenge you, publicly now, to send me your data in a format I can use, and describe your method. The devil is often in the details of the way this is done -- your handling of the zero values, the way the Ewings were selected and their small number, etc. Also, your contention is wrong when you state that any method of TMRCA estimation that ignores the difference in impact between rare matches (off-modal matches) and common matches (on-modal matches), is seriously flawed. Here you are certainly comparing apples with oranges. A correlation coefficient merely estimates the degree of difference between pairs of haplotypes and the calibration considers ALL the marker values, including the ones you say are on- and off-modal. In addition, modal values are worthless in my RCC approach, anyway. They have very little value. The RCC time scale ignores them, and rightly so, since a modal is a mathematical construct that is virtual at best when it comes to the evolution of a progenitors haplotype (which is NOT the modal in most instances), down the various lines to the present group of testees. I do not see that your website is relevant to this discussion. Finally, about the association of genetic distance (GD) with RCC -- I have run many strings of haplotypes and have changed various marker values by 1, 2, 3, and compared many sets with each other. They show that a change of 1 in GD can cause a change in RCC of about 3, depending on which marker (low vs high) is changed. Table 1 in my published paper in the JoGG <http://mysite.verizon.net/weh8/Howard1.pdf> confirms those more extensive calculations. I stand by this association of GD with RCC and maintain that RCC contains more valuable information because it applies to every marker value, not just citing how many of them have changed. Send me your methodology and the details of your results, Sandy. Until then, my priorities will have to be elsewhere. With best regards, - Bye from Bill Howard On Jul 10, 2011, at 4:04 AM, Sandy Paterson wrote: > Yesterday I posted two tables showing what I got for RCC's between Ewing > 26605 and 61 other 111-marker testees, over 37, 67 and 111 markers. The > first table was wrong; I am happy that the 2nd table is correct. The second > table suggests that the RCC's for Ewing 26605 are over-stated over 37 > markers compared to both 67 marker results and 111-marker results. > > The trouble is that the comparison involved only one Ewing. > > So what I then did was to calculate RCC's over 37 and 67 markers for all of > the M222 Ewings I have on file with 67-marker tests. There are 19. Each of > the 19 Ewings were compared to all of the M222 non-Ewings, a total of 19 x > 552 RCC calculations. The results are summarised as follows: > > Mean RCC over 37 markers 42.62 > Mean RCC over 67 markers 35.91 > > I haven't done a hypothesis test, but I'd suggest that the difference is > significant, and suggests strongly that RCC's between Ewing and non-Ewing > (and hence TMRCA's) are over-stated over 37 markers. I haven't yet had time > to check this for other surnames, but I may get round to it later this > morning. > > The other concern I have about the exercise, is that in my opinion, any > method of TMRCA estimation that ignores the difference in impact between > rare matches (off-modal matches) and common matches (on-modal matches), is > seriously flawed. > > My website is finally fully functional, so I can now refer to > > http://www.tmrca.com/?page_id=11 > > where this is discussed. Lookups of estimated TMRCA's can be done by > navigating to the section called 'Live Lookups'. > > At this stage, I consider only single-step mutations of one step up and one > step down, with probabilities of m/2 each, where m is the assumed > (marker-specific) mutation rate. I've started working on allowing for 2-step > mutations however, since empirical evidence is starting to appear suggesting > that about 3.5% of all Y-STR mutations are multi-step. > > > Sandy > > > > > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 04:53:48
    1. [R-M222] M222 Tree
    2. Sandy Paterson
    3. Yesterday I posted two tables showing what I got for RCC's between Ewing 26605 and 61 other 111-marker testees, over 37, 67 and 111 markers. The first table was wrong; I am happy that the 2nd table is correct. The second table suggests that the RCC's for Ewing 26605 are over-stated over 37 markers compared to both 67 marker results and 111-marker results. The trouble is that the comparison involved only one Ewing. So what I then did was to calculate RCC's over 37 and 67 markers for all of the M222 Ewings I have on file with 67-marker tests. There are 19. Each of the 19 Ewings were compared to all of the M222 non-Ewings, a total of 19 x 552 RCC calculations. The results are summarised as follows: Mean RCC over 37 markers 42.62 Mean RCC over 67 markers 35.91 I haven't done a hypothesis test, but I'd suggest that the difference is significant, and suggests strongly that RCC's between Ewing and non-Ewing (and hence TMRCA's) are over-stated over 37 markers. I haven't yet had time to check this for other surnames, but I may get round to it later this morning. The other concern I have about the exercise, is that in my opinion, any method of TMRCA estimation that ignores the difference in impact between rare matches (off-modal matches) and common matches (on-modal matches), is seriously flawed. My website is finally fully functional, so I can now refer to http://www.tmrca.com/?page_id=11 where this is discussed. Lookups of estimated TMRCA's can be done by navigating to the section called 'Live Lookups'. At this stage, I consider only single-step mutations of one step up and one step down, with probabilities of m/2 each, where m is the assumed (marker-specific) mutation rate. I've started working on allowing for 2-step mutations however, since empirical evidence is starting to appear suggesting that about 3.5% of all Y-STR mutations are multi-step. Sandy

    07/10/2011 03:04:10
    1. Re: [R-M222] Milligan
    2. Allene Goforth
    3. Hi Alan, Thanks for the clarification. No problem now. No, the five lines I have tested were living in the Arisaig area up in the Highlands (across from Skye and the Small Isles) at the time of emigration in 1790--Arieniskill and Glenuig, to be exact. Those places are about 9 miles apart, but Arieniskill now exists only on OS maps and hikers' brochures. I thought they had nothing to do with the Lowland McAdams until the pattern of matches with Lowland McAdam, Grierson, and Milligan began to emerge. It now looks very much like they came from the Lowlands at some unknown date, possibly via Ayr, then Glasgow, or Argyll to Arisaig. I have a huge amount of research to do. I do not know when they left the Lowlands, but I believe the McAdams were in the Arisaig area at the time of Culloden. I apologize for confusing you. Allene On 7/10/2011 3:11 AM, Alanmill10@aol.com wrote: > Allene > > All the surnames on the Map refer to those people who have at least one DNA > result with a paper trail that puts the history of the surname back beyond > 1600. I am trying to capture those known families who have an early > history in the southwest of Scotland and to see if there is a pattern that > indicates whether or not they arrived in this area under the Gall Gaedhill or > perhaps during an earlier period. > > The McCaddams certainly are known to have lived in the Carsphairns area > before 1600, but from your last email, I thought, your family came from > Kilfinan in Argyleshire, which doesn't lie in the southwest of Scotland. The are > others on this list who are working in that area. > > I am more than happy to include this area on the Map, if it helps. > > Alan > > > > > > > In a message dated 09/07/2011 13:31:24 GMT Standard Time, > agoforth@moscow.com writes: > > Hi Alan, > > What happened to the Mac/McAdam/McCaddam that was on the earlier version > of this map you sent me a few days ago? When I clicked on the earlier > version attached to your email it was no longer available. > > Thanks for putting the McCords on the map. I was wondering where they > came from. > > Allene > > > > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to > DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject > and the body of the message > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 01:36:17
    1. Re: [R-M222] Milligan
    2. Allene All the surnames on the Map refer to those people who have at least one DNA result with a paper trail that puts the history of the surname back beyond 1600. I am trying to capture those known families who have an early history in the southwest of Scotland and to see if there is a pattern that indicates whether or not they arrived in this area under the Gall Gaedhill or perhaps during an earlier period. The McCaddams certainly are known to have lived in the Carsphairns area before 1600, but from your last email, I thought, your family came from Kilfinan in Argyleshire, which doesn't lie in the southwest of Scotland. The are others on this list who are working in that area. I am more than happy to include this area on the Map, if it helps. Alan In a message dated 09/07/2011 13:31:24 GMT Standard Time, agoforth@moscow.com writes: Hi Alan, What happened to the Mac/McAdam/McCaddam that was on the earlier version of this map you sent me a few days ago? When I clicked on the earlier version attached to your email it was no longer available. Thanks for putting the McCords on the map. I was wondering where they came from. Allene R1b1c7 Research and Links: http://clanmaclochlainn.com/R1b1c7/ ------------------------------- To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/10/2011 12:11:32
    1. [R-M222] M222+ and M222-
    2. Bernard Morgan
    3. Do we know what portion of NWI modal is M222- and who are they? I ask for there are five Morgans that are within NWI modal. Plus another who is 2 for 3 for the M222 modal markers (i.e. DYS372, DYS385b and DYS390). He is also 18 for 25 (72%) of Bill's modal values. I am M222+ and the '72 percenter' is M222-. So by all accounts though we are both Morgans, we are from different families yet from the same Gaelic kinship group? (There are number of Irish O'Morgan families.) The geographical split of the Morgans is: two (including myself) whose ancestor came from Ireland. The '72 percenter' gives 18th century Dundee as his origin. The other three are from colonial Virginia/Carolina and two of which believe themselves to be Scots-Irish descendants. So are any of the other four untested M222 members not M222+? And how should I consider a M222- with simlar halpotype?

    07/09/2011 01:50:53
    1. Re: [R-M222] RCC values
    2. Sandy Paterson
    3. As it happens, I got what I hadn't expected. There was indeed an error, which I've corrected, and the corrected values are at http://dl.dropbox.com/u/2733445/RCCFILE.csv If you look at the results now, you'll see what I had expected, namely that the RCC's for Ewing 26605 compared to the other 61 testees are significantly lower on average over 111 markers than over 37 markers. This is simply because over 37 markers, the proportion of Ewing markers on modal is far smaller than is the case over 111 markers. Thinking about it, you shouldn't need the file to check my figures, you should be able to do that by identifying someone by kit number and checking the 37 marker results against what you've got for the same person over 37 markers. For your own peace of mind though, I'll attach the .csv file for the 62 111-marker haplotypes to an off-list e-mail. Sandy -----Original Message----- From: dna-r1b1c7-bounces@rootsweb.com [mailto:dna-r1b1c7-bounces@rootsweb.com] On Behalf Of Bill Howard Sent: 09 July 2011 14:14 To: dna-r1b1c7@rootsweb.com Subject: Re: [R-M222] RCC values Sandy, When I made that comparison, I did it for a group of Hamiltons who had been tested at 37 and 67 markers. At the time I was a bit more naive than I am now about the differences you can get if you go away from the familiar 37 marker set, so you are probably right that I was too hasty in making that statement a couple of years ago. Yes, you may get a different result because the marker comparisons will be different. My approach correlates the whole string. Since new marker values are added, the results are expected to differ. That's why I have confined virtually all my analysis to 37 markers where I know what's going on. The RCC time scale is calibrated on 37 markers, too. It will surely be different at 67 and again at 111. But not enough pedigrees are available to do an independent RCC calibration at those higher DYS sites. That's also why I insist on studying only the SAME DYS sequence for all 37 haplotypes. I don't need to check your calculations. You got what I expected. However, send me an Excel file with the entries in separate columns and I will take a look at what you got. The URL you gave lists them in a csv format that is hard to work with. Finally, take a look at the following two trees. One was done on a set of 37 markers and the other on a set of 67 markers, BOTH FOR THE SAME SETS of testees. You will see that they are different in detail but they show many overall similarities. http://mysite.verizon.net/weh8/CrispinCousins37.pdf http://mysite.verizon.net/weh8/CrispinCousins67.pdf And no, I don't know in detail if you have done anything wrong, but I am not surprised at what you found. I think that a 67 and 111 marker set will more closely define modern clusters, but I am not convinced that it will be genetically useful in tracing mutations on genetic time scales back thousands of years, mainly because the new markers have been picked to give better insight into the genealogical time intervals, but that's just an intelligent hunch on my part. So send me the delineated file and I will take a crack at it. - Bye from Bill Howard On Jul 9, 2011, at 7:57 AM, Sandy Paterson wrote: > A question for Bill: > > In the section 'Methods--Part I : Forming the RCC Matrix' of your first > paper, you say > > 'Results from 67 markers can also be used; they are virtually identical to > the results using 37 markers.' > > I've taken the 62 111-marker M222 results I have on file, and calculated RCC > values between Ewing kit number 26605, and the remaining 61 testees. I > calculated RCC's separately over 37, 67 and 111 markers. The results can be > seen at > > http://dl.dropbox.com/u/2733445/RCCFILE.csv > > > At this stage I'm working on the assumption that I have done something > wrong, because the results I get vary dramatically depending on whether you > use 37, 67 or 111 markers. A case in point is the comparison between Ewing > 26065 and Paterson 118913. The results I get are > > 37 markers RCC = 12.7 > 67 markers RCC = 32.2 > 111 markers RCC = 59.2 > > Are you able to check my calculations? I can send you my file of the > 111-marker results if you have difficulty extracting them from the M222 > site. > > > Sandy > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message R1b1c7 Research and Links: http://clanmaclochlainn.com/R1b1c7/ ------------------------------- To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/09/2011 09:43:26
    1. [R-M222] Some details about the RCC process, the RCC Matrix, RCC clusters and Interclusters and the RCC-dated phylogenetic tree.
    2. Bill Howard
    3. Recently I have had some off-list correspondence with some people who were trying to explore some of the nuances and relationships between the RCC time scale, the RCC matrix and the phylogenetic tree that has an RCC scale associated with it. Let me try to explain what is going on and how to interpret the results. I have posted seven web sites that I will refer to in what follows. I am confining most of my remarks to only two surnames, each of which contain seven testees and appear close together on the tree -- the Howles and the McGoverns. The same approach and logic applies to other clusters although the ones I chose here are very simple in form. In fact we can analyze many hundreds of haplotypes at a time. Take a look at the following postings at the URLs noted: http://mysite.verizon.net/weh8/Howle-McGovern.pdf These are the haplotypes that I use http://mysite.verizon.net/weh8/HowleCluster.pdf Here are the RCC Matrix RCC values for the Howles http://mysite.verizon.net/weh8/McGovernCluster.pdf Here are the RCC Matrix RCC values of the McGoverns http://mysite.verizon.net/weh8/Howle-McGovernIntercluster.pdf Here are the intercluster RCC values. http://mysite.verizon.net/weh8/M222SurnamesA.pdf Look at the Howle and McGovern clusters on the tree. http://mysite.verizon.net/weh8/TimeSliceMatrix.pdf Time slice matrix for RCC values between 0 and 10 http://mysite.verizon.net/weh8/OCathain.pdf The O'Cathain tree The Howles and the McGoverns have very close connections in time that can be seen on both the RCC matrix and on the latest surname phylogenetic tree. Of course, I make that conclusion based on the approach that I have developed in carrying out the correlation approach on many haplotypes. That approach allows us to make an estimate of the TMRCAs of pairs of haplotypes. When many haplotypes tend to cluster as both these two surnames individually do, then the error of the time estimate goes down by approximately the square root of the number (minus one) until we reach errors in the mutations themselves. Errors we expect are less than about 2-3 in RCC or 180 years in time (and these are standard deviations). Once a cluster has been identified in the RCC matrix (entries 2 and 3, above), a study of the intercluster region of the RCC matrix reveals the TMRCA of the two clusters. The intercluster region on the RCC matrix is found at the intersections of the Howle and McGovern entries, paired together, in the 4th entry, above. On the RCC matrix the intercluster region shows many numbers that represent the two by two RCC values of one member in cluster 1 paired with a member of cluster 2. All the values of all those pairs appear in the intercluster region of the RCC matrix. Now look at the tree (entry no. 5, above). What Mathematica does is to take the paired relationships in the clusters AND the values of the pairs in the intercluster region and it forms a tree. As it does it, it determines the optimum values of the entire haplotypes of pairs in the cluster and presents them on the tree. It also optimizes all the numbers in the intercluster region and puts the TMRCA connection between the two surname clusters at the junction point where those numbers have been optimized. The RCC values on the Tree have been calibrated by using over 100 pedigrees and the result is that 10 RCC ~ 433 years. The tree's time scale is given in RCC units that will not change. If a better pedigree-time calibration is done, then the RCC equivalent in years will change and that's why we plot it in RCC units. Going further…. We know now that every time there is a junction point on the phylogenetic tree, there must be a corresponding mutation because that is what the junctions represent. We can view the situation in reverse. Namely, where there is a junction point there must be a mutation. The real challenge is to find the markers that have mutated to cause the change and the junction point, and we are not there yet. But we can approach that by looking at the marker changes that differentiate one cluster from another - that is, their individual 'fingerprints'. We also know that SOMETHING has to differentiate one cluster from another. Since clusters are defined by the haplotypes of their members, the thing that separates one cluster from another are the cluster members' marker values. So, the difference in how clusters are defined on the tree can be explained by the differences in their marker values. Now, in our example, the thing to do is to look at the marker values of the Howles and the McGoverns and see, not how they are similar, and not how they differ from other clusters, but HOW THEY DIFFER FROM EACH OTHER. That is why the differences in the marker values between any two clusters are so important. It is because each cluster has a fingerprint defined by the marker values of its members. If we look closely at the marker differences between the Howles and the McGoverns (entry no.1, above), we find that out of the 7 examples in each cluster, some markers are entirely, and consistently different. I have shown that three of them at DYS 393, 460, and H4 are entirely different (look at the first URL, above and note those haplotype differences). And there are seven other markers that contain a mixture of marker values. Those marker differences separate these surnames from the rest and help to place them on the phylogenetic tree. Genetics dictates, I think, that when you have a surname cluster that contains different values at a particular DYS, you see that mutations have been occurring in that surname during the recent past. That part of the fingerprint has been mutating before our eyes! Following my papers in JoGG, those mutations will, in time, form subclusters, and as more lines descend, those subclusters will become separate clusters of their own, provided their lines do not die out. We are watching the process now, at one particular snapshot of their mutating evolution. We are observers looking at a snapshot of mutating markers at our particular time in the evolution of the markers on the Y-chromosome. DYS 391, at 10 for both Howle and McGovern clusters, differs from most of the other haplotypes in the sample. That difference sets these two surname clusters apart from other surname clusters, but not from themselves. You have to look for differences of marker values between two clusters when you compare them with each other. All the work I have done indicates without question in my mind that these two surname clusters are closely related for the reasons I give above. Similarities don't lead to important conclusions when two clusters are compared to each other, nor to other clusters. Differences do. More recently we have found that Mathematica can form a tree from the haplotypes directly, without going through the process of forming the RCC matrix, although it can be programmed to do so. While skipping the RCC matrix eliminates a step in forming the tree, I should point out that the RCC matrix can provide valuable insight into the connections. For example, from the RCC matrix you can form a time slice matrix (see my first paper in the JoGG). There you can present the RCC values that show the connections within a given interval of RCC (hence a given interval of time). That aspect of the analysis can have exceptional utility. As an example of a time-slice matrix, I refer to entry 6, above. Here we show a time slice matrix where the TMRCAs of pairs of testees whose names and Kit numbers are on the left. The slice is between RCC= 0 (1945 assumed as the average birth year of a testee) and RCC= 10 (about 433 years ago, or AD 1500). The errors are again estimated to be a few hundred years, so the results should not be over-interpreted. You can clearly see two main clusters along the diagonal. Two rather large clusters can be seen. They would be more filled if we had chosen RCC= 20 for the display, so they are showing closer relationships (genetically) among the TMRCAs of the pairs than usually occur in a typical surname cluster with members at and below RCC ~20. The upper left cluster is a mixture of Gallaghers, Cane/Kane, etc., and a few other surnames. The lower right cluster is more completely filled (it is younger), and is composed of Cane/Kane, etc., and the McHenrys. They are clearly different. All the entries are members of the O'Cathain group of testees sent to me by John McLaughlin. What you see is only a part of the O'Cathain RCC matrix, and only in that particular slice of time. However, entry 7 gives the positions of every testee in a phylogenetic tree of the O'Cathain group. When you compare the entries in the RCC time slice matrix with the positions of those same testees in the tree, the analysis gets quite complicated and questions arise that are still not solved. For example, look at the intercluster region on the time slice matrix colored in yellow. Those pairs of testees have one member from the upper cluster paired with one member from the lower cluster, yet their RCC values are of the same order as the ones in the clusters. This is very unusual and not yet understood. They all have surnames in the Soundex group of Cane/Kane, etc., but there are McHenrys there, too. The latter group have been known to be closely associated with the Cane/Kane group, however, and that might be giving us a clue. I hope our readers will now understand more about the process, how the tree depends on the haplotypes, the value of the intermediate RCC matrix and its time-slice matrix, and how they all tie together. - Bye from Bill Howard

    07/09/2011 09:22:02