RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: [R-M222] How is M222 defined?
    2. Iain Kennedy
    3. In the interests of giving credit where it is due, the M222 SNP test was first commercially offered by Jim Wilson's Ethnoancestry company in Scotland. I did my M222 test there soon after it was launched and before FTDNA went to market. Here is the original announcement: http://archiver.rootsweb.ancestry.com/th/read/genealogy-dna/2006-03/1141526628 Ethnoancestry still sell the test (although Faux is no longer involved with them). Iain http://www.kennedydna.com > From: weh8@verizon.net > Date: Thu, 7 Jul 2011 08:00:21 -0400 > To: dna-r1b1c7@rootsweb.com > CC: davidewing93@gmail.com; JJLNV@comcast.net; wathey@hprg.com > Subject: [R-M222] How is M222 defined? > > There has been considerable discussion both on- and off-line about how the M222 SNP is defined. > First, I understand that its early definition depended on the first 12 markers. > Next, we have the deep clade test of FTDNA with a proprietary approach we know little about. > Next, there are discussions of how the markers agree or disagree with the modal values of the deep clade test, but only with respect to the first 12 markers of the FTDNA string. > > And now, here's my "take" on the situation. > I received from John McLaughlin a large set of markers that he noted were in the M222 group. Some had been SNP tested and some had not. > I did a study of ALL 37 markers (not just the first 12) and I determined the modal value of each DYS site. > I then went back and determined for EACH TESTEE the number of times each of his own particular markers matched the modal of that same marker for the M222 sample John sent me. > I then made a graph of the percentage of each testee's marker set that matched the overall marker set. > I found that virtually ALL markers in the testee set that John sent had 73% or more markers that agreed with the set of M222 modals -- not the first 12, but all 37 of them. > > The modal values I found for all 37 markers are the following, in the sequence given by FTDNA postings: > 13 25 14 11 11 13 12 12 12 13 14 29 17 9 10 11 11 25 15 18 30 15 16 16 17 11 11 19 23 17 16 18 17 38 39 12 12 > > Of the 683 M222s in the group, all matched 73% of that sequence (at least 27 of the 37 markers). The average was 85.2% and both the median and the mode was 83.8%. One testee, 26917 (MacKenzie) matched 100% of the modals. > > I also found that if you made a testee plot of the number of markers that matched the M222 modal against their frequency of occurrence for all the 683 testees, the plot between 26 and 37 markers was bimodal, with two peaks. One peak was at 31 markers and the other peak was at 33 markers. A statistician might say that the departures from a Gaussian are not significant and that there are NOT two peaks, but I think it is arguable. When I do the same plot using 320 testees which are among a set with a larger number of SNP-tested testees, the bimodality is more pronounced but still statistically inconclusive. The two peaks are sharper and appear at the same place on the histogram. > > So, what do I conclude with all this? > First, that we cannot go by just the first 12 markers. We have more at our disposal to study. > Second, while we refer to the M222 SNP test of FTDNA, we realize that we take their results on faith about their criterion of who should be included in the M222 group. > Third, my analysis shows that you can safely (?) put a testee into the M222 group IF 73% or more of his 37 markers agree with the modal values of all 37 (not 12) markers. That is a practical working criterion for M222 inclusion in the group. I have given the modals, above, so now anyone can compare a haplotype with it and make your own conclusion. That criterion correlates well with FTDNA's M222 SNP-tested group. > > Now, we must realize that there are extreme variations in the mutation rates of the markers and that's why less than 100% of the testees are in the M222 group. The mutation rates vary by a factor of almost 400 between the fastest and slowest mutating DYS sites. Why does 26917 MacKenzie have a 100% match? Well, statistically, out of 683 testees whose markers are mutating over the time from the M222 progenitor to the present, you would expect one line not to vary at all, and that line has led to 26917 MacKenzie. In fact, his haplotype may provide a clue or a means to tease out some of the mutations that have taken place over time. That's an exercise still to be done. Now, when you have a set of fast to slow mutating DYS sites, you should be comparing the DIFFERENCES in marker values along the mutating lines. I include now a table that shows the percentage of M222 testees that have mutations at the various points in the haplotype. For example, those with 454 had a constant va! > lue of the modal for 454, and less than 50% of the testees had the modal for the two CDYs. > > DYS %Y > DYS454 100% > DYS426 99% > DYS388 99% > DYS459a 99% > YCAIIa 98% > DYS438 98% > DYS393 98% > DYS455 98% > DYS448 96% > DYS392 95% > DYS385a 95% > DYS459b 93% > DYS19 93% > DYS437 92% > DYS464a 90% > DYS442 90% > Y-GATA-H4 89% > DYS385b 88% > YCAIIb 88% > DYS389i 88% > DYS447 87% > DYS464b 87% > DYS464c 86% > DYS464d 85% > DYS390 85% > DYS607 85% > DYS391 83% > DYS389ii 80% > DYS439 79% > DYS570 77% > DYS458 74% > DYS449 71% > DYS460 70% > DYS456 68% > DYS576 58% > CDYb 46% > CDYa 42% > > Now, with the modal values, and with the table just above, you could analyze the slow moving markers among the haplotypes and see what happens. The fast moving markers are useful only for small values of RCC, whereas the slow moving markers will give insight about what was happening to the marker strings nearer the time of the progenitor - the higher values of RCC. > > So, my fourth conclusion is that the sequence of junctions on the phylogenetic tree, calibrated in terms of RCC values, will probably give valuable information not only on how the DNA clusters (which later evolve into surname groups) actually evolved over time but give us valuable fingerprints that differentiate one cluster from another (and at RCC values less than 20, the TMRCAs of the progenitor who was at the junction point that leads to different surnames. A clever programmer might help here! The data are available (!). > > - Bye from Bill Howard > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/07/2011 01:01:51