Bill, I'm sorry to buy into this discussion, but I think it is incredibly important that we all, including the tyros in this (amateur) business, understand what we mean. In the ten years of so that we enthusiasts have been dealing with DNA genealogy, a certain jargon has grown up. The professionals in the business have developed more and more complicated ways of describing haplotypes and haplogroups, using alphanumeric strings. To simplify things, we amateurs use shorthand, such as M222 (which is the code for a certain clade in the greater R haplogroup, but is commonly referred to as a haplogroup itself) because if saves a lot of fiddling referencing when communicating. Included in the jargon is the meaning of M222+, which means that a testee has been shown positively to carry the M222 mutation. M222-, conversely, means that a testee has been shown positively NOT to carry the mutation, and is most commonly attached to a member of the ancestral mutation (L21 as far as we know). Rarely, it might be attached to the member of a more remotely connected clade of the "R" haplogroup, who has taken the test for whatever reason. We can further say that if the haplotype itself does not give us a degree of certainty (see below), and the bearer has not taken the SNP test, then technically the haplogroup (or the SNP) is "unknown". If the relationship was even more remote, ie, if a member of an unconnected (except in primeval terms) haplogroup took the test, under current assumptions that would be a waste of effort, and the TMRCA would be in the tens of thousands of years. Now, we DNA genealogists are predominantly interested in the most recent millennium, because that encompasses the time of the use of surnames, and, generally speaking, the period of usable record keeping. However, that is not to say that we don't also have an interest in estimating the age of our haplogroup. But we mostly don't have a particular interest in estimating the relative distance between unconnected haplogroups. Hence our insistence that knowledge of haplogroup is essential; indeed, I have trouble conceptualizing why you insist that you can derive useful information from unrelated haplotype strings. Now you said "If the two haplotype strings are statistically the same, I don't really care. They lead to the same dates." There is a very good reason for this. The two sets of data you were given were, with a very low probability of error, all carriers of the M222 mutation, (theoretically M222+), even though untested. That's why they are statistically the same, and lead to the same dates. It is true that more than half had not been tested positively for M222. However, the characteristic haplotype for the bearers of this mutation is such that M222+ can be predicted quite reliably. This turns on the following particular DYS values: in the FTDNA markers 1-67, DYS385b=13, 392=14, 448=18, 449=30, YCAIIb=23, 607=16, 413a=21, 534=16, 481=25. Latterly I have further identified DYS710=35, 714=24, 549=12, and 513=13 in the 68-111 marker range. Because M222 is a "young" SNP mutation, there haven't been many random DYS mutations since then, so most M222 members carry the great majority of the above DYS values. So, in our various ways, I think we are trying to say to you that using this particular data, you shouldn't draw conclusions about unrelated data sets. These are related. I accept your statistical expertise, but ask, what knowledge do we gain by comparing unrelated data sets, say, members of Haplogroups defined by M222, L21, I1 and G2a? I ask because they are the haplogroups identified in my surname study. We already know that they divided one from the other during the last 40,000 years, but unless we are trying to define an individual's place in all of this, what is to be gained? Regards David Grierson On 11/07/2011 8:25 AM, Bill Howard wrote: Paul, If the two haplotype strings are statistically the same, I don't really care. T hey lead to the same dates. I agree, we are now beating on a dead horse. I am sorry you think I am tiresome but you don't appear to understand that the date of origin depends only on the haplotypes presented to the program, not whe ther or not it is a member of a particular SNP. (In the two postings immediately below, I find that only you used the word "unk nown")….. - Bye from Bill Howard On Jul 10, 2011, at 6:11 PM, Paul Conroy wrote: Bill, Once again M222- does NOT mean untested, it mean (sic) TESTED NEGATIVE. Unknown means untested. You're getting tiresome. On 7/10/11, Bill Howard [1]<weh8@verizon.net> wrote: Hi, David, I did see your posting and I apologize for being a bit tardy in my reply. I got into this when a friend suggested looking into the M222 SNP and to see if there is a connection between it and Niall and his descendants. My look at the situation indicates that, while Niall and the UiNeills may have carried the SNP, it cannot be proved that they did so. My date determination (see below) indicates that the SNP did not originate with them. In the process I became aware that one of the things that the DNA folks wanted to do was to try to date the origin of the M222 SNP. Since my RCC approach could do that estimate, I wanted to analyze haplotypes that were in the M222 family. To prepare for the analysis, I was given a large list of M222 folks, and later found that only some of them had been SNP tested. I found that only slightly in excess of 320 had actually been tested, so I collected them as a second database. Next, there was a list exchange that suggested that the M222 group should be separated into plus and minus groupings, with minus not being well-defined except that they had not been tested. Before that exchange I tried to see if I could separate the plusses and the minuses by their haplotypes alone, and I found that they were statistically the same. If there was a separation by SNP testing they certainly did not stand out as being separate from their haplotypes. That analysis has already been posted. Now, since they looked to be the same, I separated my analysis into the two databases, the ones that had been called M222, a mixture of those tested and untested, and only those that had been tested. I ran a TMRCA for both groups and found that the answers were the same within the estimated error of about 300 years SD. It is a bit premature at this stage to give the answer I got since it has not been fully discussed with my potential co-author, but it was considerably earlier than Niall and was more like the dates that John McEwan got in the BC era. More on this later. To address your question about how I can calculate a time for the mixture, I say that if I cannot distinguish the difference from the haplotypes and since Mathematica works only on those haplotypes (without any knowledge of which group it is being given to analyze), I should get the same answer if I use either the large or the small sample. And that's what I got, again within the uncertainty of the errors involved. The answer for the M222 plus sample is statistically the same as the answer from the larger database. That's because the haplotypes inputted to Mathematica in the two samples were statistically the same. So, if you want the answer to dating M222 plus alone, it is the same date. I think that my analysis has been professionally rigorous given the statistical equalities within the two databases. I hope this answers your questions, David. - Bye from Bill Howard On Jul 10, 2011, at 4:10 PM, David H. MacLennan wrote: Dear Bill, Yesterday I posted a note concerning the M222 SNP status of your data (see below), but you have not responded. Can you please comment on what I said. I am particularly concerned about your dating of the time of the M222 mutation. If you are looking at samples of M222+ that are mixed with M222-, how can you calculate a time of the mutation? David Dear Bill, As a biological scientist I find it distressing that you and others are trying to convince us that it doesn't really matter if your SNP test does or does not show that you are M222+, you can still be included in the M222 project on the basis of your STR haplotype. Data based on such an assumption would not be acceptable in a rigorous scientific journal. It would seem to me that the benchmark of the M222 project should be the presence of M222+. At some stage in our background two brothers may have had an identical or nearly identical STR haplotype, but brother one had a de novo mutation that created the M222 SNP and brother two did not. The descendants of brother one would be M222+ and the descendants of brother two would be M222-. This de novo mutation occurred at a specific date and we would all be very interested in that date. However, if the samples used to measure that date are a mixture of = and - SNPs, then you can't measure the date of appearance of M222 accurately because common STR haplotypes would predate the appearance of the M222 SNP. Let's focus on the rigor of the analysis, not the cost of SNP testing. David -- Dr. David H. MacLennan, Banting and Best Department of Medical Research, University of Toronto, Charles H. Best Institute, 112 College St., Toronto, Ontario, Canada M5G1L6 Tel:1-416-978-5008 Fax:1-416-978-8528 [2]http://www.utoronto.ca/maclennan R1b1c7 Research and Links: [3]http://clanmaclochlainn.com/R1b1c7/ ------------------------------- To unsubscribe from the list, please send an email to [4]DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message -- Sent from my mobile device R1b1c7 Research and Links: [5]http://clanmaclochlainn.com/R1b1c7/ ------------------------------- To unsubscribe from the list, please send an email to [6]DNA-R1B1C7-request@roo tsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message R1b1c7 Research and Links: [7]http://clanmaclochlainn.com/R1b1c7/ ------------------------------- To unsubscribe from the list, please send an email to [8]DNA-R1B1C7-request@roo tsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message ----- No virus found in this message. Checked by AVG - [9]www.avg.com Version: 10.0.1388 / Virus Database: 1516/3757 - Release Date: 07/10/11 References 1. mailto:weh8@verizon.net 2. http://www.utoronto.ca/maclennan 3. http://clanmaclochlainn.com/R1b1c7/ 4. mailto:DNA-R1B1C7-request@rootsweb.com 5. http://clanmaclochlainn.com/R1b1c7/ 6. mailto:DNA-R1B1C7-request@rootsweb.com 7. http://clanmaclochlainn.com/R1b1c7/ 8. mailto:DNA-R1B1C7-request@rootsweb.com 9. http://www.avg.com/