Dear Jim, The major issue with using unphased data for comparisons (in Relative Finder, Family Finder, or other programs such as Jim McMillan's that compare unphased data) is that a significant percentage of shorter matching segments will be identical by state rather than identical by descent. Relatively few segments over 12-15 cMs in length are likely to be identical by state, but as we get down into analysis of segments that are in the 7-8 cM range a significant percentage of such segments will be identical by state. John Walden did a recent analysis that suggested that about 50% of matching segments 8 cMs in length are identical by state. Here is a small table of results that he recently compiled: cM %IBD %IBS 10 99 1 9 80 20 8 50 50 7 30 70 6 20 80 5 5 95 Doing comparisons using phased gets around the issue of identical by state matches. Another simple method is what I have previously referred to as "poor man's phasing". This is where you compare a parent and a child to another person who has a matching segment with the parent. If the child matches on the same segment with this other person that the parent matches on, then you can be assured that the segment is highly likely to be IBD and not IBS. However, if the child doesn't match with the other person and if the matching segment is under 10 cMs or so then there is still a reasonable possibility that the segment is IBS. This is one reason why it is a good idea to have as many children as feasible do the 23andMe or FF test since you will better be able to determine which segments are IBD and which segments could be IBS. Sincerely, Tim Janzen -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jim Bartlett Sent: Friday, January 06, 2012 8:19 PM To: [email protected]; [email protected] Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues ? Sam What is the "big issue" about unphased data? As always, my reference is to genealogy. I can see where it would help, some, to know if a match was on my Dad or Mom's side, but SURNAMES and geography can help there, too. And even if I knew which parent my match was on, I still wouldn't know which grandparent. Jim
Tim, Thanks for the explanation. I'm beginning to catch on. So if a genealogist just takes the Family Finder test, and knows nothing about phasing, they should keep Walden's table handy and expect that half of their matches around 8cM won't really be cousins (at least with a shared IBD segment) Walden's table looks very similar to the percentages that FTDNA uses to indicate what percentage of your cousins will show up as a match with Family Finder... Are these the same? This would mean most of your cousins would be listed, but a percent would be IBS, and so not atDNA cousins. I don't think this is the case because I have a number of known cousins, with atDNA tests, who do not match me (FF or 23) So the other option is that FTDNA's percent eliminates some of your cousins from appearing on the match list, and Walden's table eliminates a percentage of these from being atDNA cousins. This is really whittling down the true matches. And worse yet, the unphased don't know which are which. If I do understand this, we can't rely on the list from, say, FTDNA. We need to manually?, or with some software, determine if the A in AG is from Mom or Dad, and do this for our whole raw data (how many ACTGs?). And then how do we compare this with the long segments of our matches? Is this part of the FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. Now I'm just trying to see how I get this info, and then how I use it with my list of matches to capture the ones who are IBD. I guess one way is to cull out everyone below 10cM. Now I'm catching on. But this will eliminate most of my matches... It also covers a number of folks with whom I've already found common ancestry - the point here being that we might be cousins, but we probably don't share a large IBD segment... This is worse than finals week - I need to sleep. Jim - Sent from my iPhone - FaceTime! On Jan 7, 2012, at 12:29 AM, "Tim Janzen" <[email protected]> wrote: > Dear Jim, > The major issue with using unphased data for comparisons (in > Relative Finder, Family Finder, or other programs such as Jim McMillan's > that compare unphased data) is that a significant percentage of shorter > matching segments will be identical by state rather than identical by > descent. Relatively few segments over 12-15 cMs in length are likely to be > identical by state, but as we get down into analysis of segments that are in > the 7-8 cM range a significant percentage of such segments will be identical > by state. John Walden did a recent analysis that suggested that about 50% > of matching segments 8 cMs in length are identical by state. Here is a > small table of results that he recently compiled: > cM %IBD %IBS > 10 99 1 > 9 80 20 > 8 50 50 > 7 30 70 > 6 20 80 > 5 5 95 > > Doing comparisons using phased gets around the issue of identical by > state matches. Another simple method is what I have previously referred to > as "poor man's phasing". This is where you compare a parent and a child to > another person who has a matching segment with the parent. If the child > matches on the same segment with this other person that the parent matches > on, then you can be assured that the segment is highly likely to be IBD and > not IBS. However, if the child doesn't match with the other person and if > the matching segment is under 10 cMs or so then there is still a reasonable > possibility that the segment is IBS. This is one reason why it is a good > idea to have as many children as feasible do the 23andMe or FF test since > you will better be able to determine which segments are IBD and which > segments could be IBS. > Sincerely, > Tim Janzen > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Jim Bartlett > Sent: Friday, January 06, 2012 8:19 PM > To: [email protected]; [email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues ? > > Sam > > What is the "big issue" about unphased data? As always, my reference is to > genealogy. > > I can see where it would help, some, to know if a match was on my Dad or > Mom's side, but SURNAMES and geography can help there, too. And even if I > knew which parent my match was on, I still wouldn't know which grandparent. > > Jim > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message
Dear Jim, I agree that it pays to concentrate on your matches at either FF or 23andMe that are over 10 cMs. If you chose to contact people who are matching you at less than 8 cM or so you need to keep in mind that a significant percentage of these matches will be IBS unless one of your children also matches them on the same segment, in which case they will be IBD. Ideally, you would like to phase your data and then compare your phased data to that of your matches. The only way you can do that is to test two parent/one child trios and then use a phasing program like the one I wrote or David Pike's to compare your phased data with that of your matches. In order to do this, your matches would need to share their raw data file with you. I suspect that many people won't be willing to do that. Ideally everyone would be downloading their raw data files and uploading them GEDmatch where you could do comparisons using phased data. So far neither 23andMe or FTDNA have made a move to phase their data. I talked to Bennett Greenspan about doing this for Family Finder in November, but he was reluctant to spend the money on the programming needed to phase the data for the two parent/one child trios that he currently has in FF. As I have mentioned before, we all need to be mapping our chromosomes so we know as best as possible which ancestral line each DNA segment came from. In your particular case, you have quite a few matches who could be distant cousins (6th-10th cousins). What you need to do is test as many 2nd and 3rd cousins as feasible from the lines you think you match your other distant cousins on to see if the DNA segments you share with those distant cousins came through the ancestral lines you think they could have come from. Using phased data for doing the comparisons would also be of big help to you if you could get your matches to share their raw data file with you. You could then quickly tell which of your matches are simply IBS. Sincerely, Tim -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jim Bartlett Sent: Friday, January 06, 2012 10:20 PM To: [email protected] Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues ? Tim, Walden's table looks very similar to the percentages that FTDNA uses to indicate what percentage of your cousins will show up as a match with Family Finder... Are these the same? This would mean most of your cousins would be listed, but a percent would be IBS, and so not atDNA cousins. I don't think this is the case because I have a number of known cousins, with atDNA tests, who do not match me (FF or 23) So the other option is that FTDNA's percent eliminates some of your cousins from appearing on the match list, and Walden's table eliminates a percentage of these from being atDNA cousins. This is really whittling down the true matches. And worse yet, the unphased don't know which are which. If I do understand this, we can't rely on the list from, say, FTDNA. We need to manually?, or with some software, determine if the A in AG is from Mom or Dad, and do this for our whole raw data (how many ACTGs?). And then how do we compare this with the long segments of our matches? Is this part of the FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. Now I'm just trying to see how I get this info, and then how I use it with my list of matches to capture the ones who are IBD. I guess one way is to cull out everyone below 10cM. Now I'm catching on. But this will eliminate most of my matches... It also covers a number of folks with whom I've already found common ancestry - the point here being that we might be cousins, but we probably don't share a large IBD segment... Jim -
Dear Tim, Thank you for outlining this process. I have very mixed emotions at this point. The main ones are betrayed and upset. I have been pushing (as in selling) Family Finder for the last 18 months - I make presentations at the FHCs in DC and Baltimore at annual workshops, as well as retirement communities and genealogy clubs in the region. I'm asked to speak, because DNA is the most technical word I use - I proclaim that no biology is needed to use the new DNA tools for genealogy, and I keep the presentation and discussions at that level. I have a Masters in engineering and my wife has a PhD in biology - but I try to keep the talks at a level everyone can understand and use. Many have taken my advice: "every serious genealogist should take the Family Finder DNA test." Many of them can barely afford this test, much less be required to fund 3, or more, in order to use it. It now appears this test is only for a very small group of folks who truly understand it, and it's not ready for the vast majority of genealogists. We've had good success with Y-DNA surname projects: "if two men have matching Y-DNA, they have a common ancestor; if not, they don't" - an easy rule that we can understand. The rules for atDNA are like Twister, very expensive, and involve many more steps than just comparing with matches. I'm sad that I've sent so many unsuspecting genealogists down this path. If I understand correctly, the simple rule for genealogy hobbyists is: "discard all matches below 10cM, and focus on the few remaining". Later today I'll see what that does in my case. In my 1024 23&me matches, what should be the equivalent (to 10cM) cutoff - in percent and/or number of segments? Is ANYONE finding any new cousins with FF or 23&me? By this I mean strangers, not the close kin you already know and/or have paid for their tests. What percent of your hitherto unknown matches have worked out? Jim - Sent from my iPhone - FaceTime! On Jan 7, 2012, at 2:09 AM, "Tim Janzen" <[email protected]> wrote: > Dear Jim, > I agree that it pays to concentrate on your matches at either FF or > 23andMe that are over 10 cMs. If you chose to contact people who are > matching you at less than 8 cM or so you need to keep in mind that a > significant percentage of these matches will be IBS unless one of your > children also matches them on the same segment, in which case they will be > IBD. > Ideally, you would like to phase your data and then compare your > phased data to that of your matches. The only way you can do that is to > test two parent/one child trios and then use a phasing program like the one > I wrote or David Pike's to compare your phased data with that of your > matches. In order to do this, your matches would need to share their raw > data file with you. I suspect that many people won't be willing to do that. > Ideally everyone would be downloading their raw data files and uploading > them GEDmatch where you could do comparisons using phased data. So far > neither 23andMe or FTDNA have made a move to phase their data. I talked to > Bennett Greenspan about doing this for Family Finder in November, but he was > reluctant to spend the money on the programming needed to phase the data for > the two parent/one child trios that he currently has in FF. > As I have mentioned before, we all need to be mapping our > chromosomes so we know as best as possible which ancestral line each DNA > segment came from. In your particular case, you have quite a few matches > who could be distant cousins (6th-10th cousins). What you need to do is > test as many 2nd and 3rd cousins as feasible from the lines you think you > match your other distant cousins on to see if the DNA segments you share > with those distant cousins came through the ancestral lines you think they > could have come from. Using phased data for doing the comparisons would > also be of big help to you if you could get your matches to share their raw > data file with you. You could then quickly tell which of your matches are > simply IBS. > Sincerely, > Tim > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Jim Bartlett > Sent: Friday, January 06, 2012 10:20 PM > To: [email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues ? > > Tim, > > Walden's table looks very similar to the percentages that FTDNA uses to > indicate what percentage of your cousins will show up as a match with Family > Finder... Are these the same? This would mean most of your cousins would be > listed, but a percent would be IBS, and so not atDNA cousins. I don't think > this is the case because I have a number of known cousins, with atDNA tests, > who do not match me (FF or 23) > > So the other option is that FTDNA's percent eliminates some of your cousins > from appearing on the match list, and Walden's table eliminates a percentage > of these from being atDNA cousins. This is really whittling down the true > matches. And worse yet, the unphased don't know which are which. > > If I do understand this, we can't rely on the list from, say, FTDNA. We need > to manually?, or with some software, determine if the A in AG is from Mom or > Dad, and do this for our whole raw data (how many ACTGs?). And then how do > we compare this with the long segments of our matches? Is this part of the > FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. > > Now I'm just trying to see how I get this info, and then how I use it with > my list of matches to capture the ones who are IBD. I guess one way is to > cull out everyone below 10cM. Now I'm catching on. But this will eliminate > most of my matches... It also covers a number of folks with whom I've > already found common ancestry - the point here being that we might be > cousins, but we probably don't share a large IBD segment... > > Jim - > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message
Thank you for such a clear explanation, Tim. Diana > From: Tim Janzen > Sent: Saturday, January 07, 2012 12:30 AM > > Dear Jim, > The major issue with using unphased data for comparisons (in > Relative Finder, Family Finder, or other programs such as Jim > McMillan's > that compare unphased data) is that a significant percentage of > shorter > matching segments will be identical by state rather than identical > by > descent... <snip> > Doing comparisons using phased gets around the issue of > identical by > state matches... <snip>