RootsWeb.com Mailing Lists
Total: 4/4
    1. [yDNAhgI] New Online Tool for Haplotype Analysis (only for y-Haplogroups I1 and I2 at present)
    2. Terry
    3. There are common questions that people often have in regards to their 67-marker STR test results. Such as place of geographic origin, likely SNP mutations, or close haplotype matches etc. To answer such questions it helps to know where someone's 67-marker STR result fits in with everyone else's result. A rational way of organising results is to compute a hierarchical cluster tree, and then systematically label each person in that tree according to an STR "Branch Code". This Branch Code labelling system is very similar to the "Henry System" used in genealogy for numbering the known descendants of an ancestor. Easiest to see what I mean, by checking out the tool at: http://www.goggo.com/cgi-bin/branchFind.cgi You can enter your FTDNA Kit Number or Ysearch ID (currently only works for haplogroups I1 and I2), and if the entry is valid, you will get your Branch Code, and then the following output: 1) a short list of close matches, and the estimated time-frame for the common ancestor of the very closest match; 2) a map showing the frequency of occurrence of your Branch Code in all countries/regions across Europe; 3) a list of SNP mutation pathways, with suggestions for your likely path based on your Branch Code. Finally, there is a link that discusses the simple methodology I used, and that link also gives additional details such as the computed tree showing the "big picture" view of how people in y-Haplogroups I1 and I2 are connected. Eventually, I may add y-Haplogroups R1b and R1a. In the meantime, for I1 and I2 people, let me know how you go. Terry

    03/02/2012 08:46:04
    1. Re: [yDNAhgI] New Online Tool for Haplotype Analysis (only for y-Haplogroups I1 and I2 at present)
    2. Haakon Styri
    3. As far as I can judge the output, this is a useful tool, and the output is user friendly. Great work, Terry. It tells me the TMRCA of the closest match, but I wouldn't know which one of the close matches listed he would be. However. that's a minor issue I guess is easy to fix. My immediate question is: will you run an updated analysis if this tool motivates more people to test 67 STR markers? :-) H.Styri > From: Terry [tdrobb@gmail.com] > Sent: 2012-03-02 05:46:04 MET > To: Y-DNA-HAPLOGROUP-I@rootsweb.com, genealogy-dna@rootsweb.com > Subject: [yDNAhgI] New Online Tool for Haplotype Analysis (only for y-Haplogroups I1 and I2 at present) > > There are common questions that people often have in regards to their > 67-marker STR test results. > > Such as place of geographic origin, likely SNP mutations, or close > haplotype matches etc. > > To answer such questions it helps to know where someone's 67-marker STR > result fits in with everyone else's result. A rational way of organising > results is to compute a hierarchical cluster tree, and then systematically > label each person in that tree according to an STR "Branch Code". This > Branch Code labelling system is very similar to the "Henry System" used in > genealogy for numbering the known descendants of an ancestor. > > Easiest to see what I mean, by checking out the tool at: > > http://www.goggo.com/cgi-bin/branchFind.cgi > > You can enter your FTDNA Kit Number or Ysearch ID (currently only works for > haplogroups I1 and I2), and if the entry is valid, you will get your Branch > Code, and then the following output: > > 1) a short list of close matches, and the estimated time-frame for the > common ancestor of the very closest match; > > 2) a map showing the frequency of occurrence of your Branch Code in all > countries/regions across Europe; > > 3) a list of SNP mutation pathways, with suggestions for your likely path > based on your Branch Code. > > Finally, there is a link that discusses the simple methodology I used, and > that link also gives additional details such as the computed tree showing > the "big picture" view of how people in y-Haplogroups I1 and I2 are > connected. > > Eventually, I may add y-Haplogroups R1b and R1a. > > In the meantime, for I1 and I2 people, let me know how you go. > > Terry > > ------------------------------- > To unsubscribe from the list, please send an email to Y-DNA-HAPLOGROUP-I-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message >

    03/02/2012 05:53:28
    1. Re: [yDNAhgI] New Online Tool for Haplotype Analysis (only for y-Haplogroups I1 and I2 at present)
    2. Terry
    3. Thanks all for the mostly positive feedback and comments. Rather than reply to individual posts, I'd like to respond in this single post. 1) Data coverage At present only 67-marker results from the major I1 and I2 projects at FTDNA are used. So 37-marker-only results are not included, nor any results that have a null value recorded for any of the 67 markers. Data used was what I captured as of mid-February 2012. I will periodically update it though. Also, I have missed out many smaller surname-only projects, but may include them when I can. 2) Close Matches / TMRCA The close matches are determined by closeness (at the right end) of the branch codes. A maximum of 20 matches is returned. The TMRCA is for the common ancestor of the kit number that was entered and it's very closest match. That very closest match will be next to (above or below) the bold kit number in the list. If it is an exact 67-marker match, then the TMRCA is reported as "within 200 years". You may have other close matches that are not in the database I have captured. 3) Geographic Frequency Maps These maps are just a factual representation of the frequency of occurrence of the self-reported place of ancestral origin, that is recorded in the various FTDNA projects, for members of the given "branch". Typically, the most-distant male-line ancestor reported by people lived anywhere from say 100 to 300 or more years ago. So the geographic maps can be pretended to be for around 200 years ago on average. Modern country boundaries are used to make plotting easier for me. 4) SNP Suggestions Raw counts of the SNP alleles (either positive or negative) are given in the link to the PDF file. But on the output page, just some likely allele values, for the given branch, are shown in red. They are not meant to be 100% certain. Each line and arrow in the pathway would in principle have a different probability. For simplicity, just a few with the highest probability are the ones shown in red. 5) Decision Tree Method The Decision Tree method is an approximation to the full Branch Code tree method. In particular, I1-BBA is roughly the same as I1.110*, and I1-AAA is like I1.000*. Some of the labels are at slightly different levels though - for example, I1-AABB is like I1.011*. 6) Examples to try. http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=H1154 -> I2.100* L369+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=133132 -> I2.0110* L39+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=27753 -> I2.01000* L161.1+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=URMN8 -> I1.111* L22+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=6392 -> I1.110* L22+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=41180 -> I1.101* Z63+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=70336 -> I1.101* Z63+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=23944 -> I1.100* L258+ (Finland) http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=29243 -> I1.011* L338+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=90967 -> I1.011* L338+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=106327 -> I1.010* Z139+,Z138+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=67129 -> I1.001* Z803+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=ZUYPF -> I1.000111* Z140+ http://www.goggo.com/cgi-bin/branchFind.cgi?Kit=N58041 -> I1.00001* Z58+ Finally, don't expect total accuracy. The STR 67-marker Tree that is constructed, can only ever be an approximation to the true genealogical tree. Mistakes necessarily happen due to chance convergence of STR results. All we get to see are the leaves (people) at the end of the branches, and if we are lucky a bunch ("cluster") of leaves will all come from the same branch. But sometimes two distinct branches will have their leaves intertwined ("convergence"), which is the main complication to be aware of when reading the results. Some spurious results are to be expected. Terry

    03/03/2012 06:06:09
    1. Re: [yDNAhgI] New Online Tool for Haplotype Analysis (only for y-Haplogroups I1 and I2 at present)
    2. Bernie Cullen
    3. Terry, Your clustering method may work well, but as you say some branches may converge. So what's the advantage of presenting your tree when we know its basic structure doesn't represent history? What useful information do your different branches give us? Why not keep your tree and branch codes in the background (let someone click to it if he wants) and just present the results (matches in someone's cluster, maps, recommended SNPs)? Or am I missing something useful about the tree? Bernie Terry tdrobb@gmail.com wrote: Finally, don't expect total accuracy. The STR 67-marker Tree that is constructed, can only ever be an approximation to the true genealogical tree. Mistakes necessarily happen due to chance convergence of STR results. All we get to see are the leaves (people) at the end of the branches, and if we are lucky a bunch ("cluster") of leaves will all come from the same branch. But sometimes two distinct branches will have their leaves intertwined ("convergence"), which is the main complication to be aware of when reading the results. Some spurious results are to be expected.

    03/03/2012 02:15:55