RootsWeb.com Mailing Lists
Total: 2/2
    1. [yDNAhgI] I1 STR Decision Tree Method versus the I1, I2 STR Branch Code Method
    2. Terry
    3. I have had some questions about the difference between the "I1 STR Decision Tree Method" at http://www.goggo.com/terry/HaplogroupI1/ and the "I1,I2 STR Branch Code Method" at http://www.goggo.com/cgi-bin/branchFind.cgi The Decision Tree Method is really just a quicker way of working out where you fit into the structure of Haplogroup I1 and how parts of I1 are geographically distributed. Useful for getting a sense of what other people are reporting as their place of origin. The Decision Tree Method (which anyone can apply for themselves) shows the geographic frequency distribution of the various clusters that are defined by the Decision Tree for haplogroup I1. In many cases, the STR clusters/clans are associated with specific SNP alleles. It does make mistakes though. For instance, the first decision point of DYS390>22 versus DYS390<=22 can predict L22 status with the following accuracy: L22+ with 99% accuracy (250 out of 252 samples with 12 or more markers) L22- with 83% accuracy (462 out of 557 samples with 12 or more markers). So it would get L22- status wrong 17% of the time. Such errors are ultimately unavoidable when using STR values to predict SNP allele status. The Branch Code Method (which is available via the on-line tool) would improve the accuracy of such predictions in most cases. For instance, it improves the L22- prediction to 94% accuracy (412 out of 436 samples with 67 or more STR markers). Other SNP markers, such as L338, L258, Z63 etc can be predicted with varying degrees of accuracy as well from just knowing your position in the computed STR tree. The main use of either the Decision Tree method or the Branch Code method is not for SNP prediction though, but instead for showing maps of the geographic frequency distribution of the various clusters/clans or branches. So SNP status is used as part of an independent validation test of the methods. Given that some STR markers may have mutated several times over the timeframe of I1 and I2, one cannot expect perfect SNP predictions. STR values can independently mutate and reach the same value as someone else's without that value being inherited from a common ancestor. That convergence of STR values is the main limitation of the two methods, or any other STR based method. For SNP prediction, prior knowledge is important. So if you know the status of some SNP allele for even a distance relative (with a common male-line ancestor), then such prior knowledge is very valuable. The I1,I2 STR Branch Code tool may help you with quickly identifying such distant relatives, and seeing their SNP status. Finally, the STR tree structure of haplogroups I1 and I2, as well as the TMRCA of the branches, is computed as a side effect of the Branch Code Method calculation. As always, there will be inherent and unavoidable errors though. Despite that, I hope either of the two methods will be of use in either confirming what you already know, or in giving you some ideas for further investigation. Terry

    03/21/2012 10:55:28
    1. Re: [yDNAhgI] I1 STR Decision Tree Method versus the I1, I2 STR Branch Code Method
    2. Terry
    3. In the example below, I seem to have gotten my numbers slightly garbled and also flipped the alleles L22+ and L22- by mistake. Corrections as follows: For: "L22+ with 99% accuracy (250 out of 252 samples ..." Read "L22- with 99% accuracy (462 out of 464 samples ..." For: "L22- with 83% accuracy (462 out of 557 samples ..." Read "L22+ with 72% accuracy (250 out of 345 samples ..." For: "So it would get L22- status wrong 17% of the time." Read "So it would get L22+ status wrong 28% of the time." For: "improves the L22- prediction to 94% accuracy (412 out of 436 ..." Read "improves the L22+ prediction to 89% accuracy (196 out of 220 ..." Thought I should correct that because the L22+ and L22- were flipped. Otherwise it was just an example. If one used more than three levels of branching to make predictions (I1.110* is three levels), then SNP predictions would get even better. But the L22+ figure in the example is just the accuracy based on only looking at three levels of the branch code. Terry On Wed, Mar 21, 2012 at 4:55 PM, Terry <tdrobb@gmail.com> wrote: > I have had some questions about the difference between the "I1 STR > Decision Tree Method" at > http://www.goggo.com/terry/HaplogroupI1/ > and the "I1,I2 STR Branch Code Method" at > http://www.goggo.com/cgi-bin/branchFind.cgi > > The Decision Tree Method is really just a quicker way of working out where > you fit into the structure of Haplogroup I1 and how parts of I1 are > geographically distributed. Useful for getting a sense of what other people > are reporting as their place of origin. > > The Decision Tree Method (which anyone can apply for themselves) shows the > geographic frequency distribution of the various clusters that are defined > by the Decision Tree for haplogroup I1. In many cases, the STR > clusters/clans are associated with specific SNP alleles. It does make > mistakes though. For instance, the first decision point of DYS390>22 versus > DYS390<=22 can predict L22 status with the following accuracy: > L22+ with 99% accuracy (250 out of 252 samples with 12 or more markers) > L22- with 83% accuracy (462 out of 557 samples with 12 or more markers). > So it would get L22- status wrong 17% of the time. Such errors are > ultimately unavoidable when using STR values to predict SNP allele status. > > The Branch Code Method (which is available via the on-line tool) would > improve the accuracy of such predictions in most cases. For instance, it > improves the L22- prediction to 94% accuracy (412 out of 436 samples with > 67 or more STR markers). > > ... >

    03/22/2012 05:23:45