Ann, I understand that a small error rate is tolerated. This is common in almost all industries. The point I'm trying to understand is how do we get a 7.7cM string (maybe 8 million ACTGs) that is identical (99.99%) to another 7.7cM string, and have that be IBS (or, to a genealogist like me, random)? But if that's the way biology works, why does FTDNA make 7.7cM the cutoff, if so many of the matches at that level are IBS? I need to be careful here, because I don't want FTNDA to increase the cutoff - I've found too many cousins in the 7-10cM range (it remains to be determined whether their large segments are IBD or IBS). Actually I 'd like a selection feature at FTDNA that would allow me to lower the cutoff to, say, 5cM (and see the additional matches at that level). I have cousins who have taken FF and don't show up as a match to me - I'd really like to be able to compare them using the Chromosome Browser to see what atDNA we have in common. I guess I need to talk them into joining me at GEDmatch... Jim Bartlett On 01/07/12, Ann Turner<[email protected]> wrote: If you got 2 FFI tests, the raw data would "only" be 99.99% identical. There will be a few genotyping errors, but FTDNA (and 23andMe) will tolerate a single mismatch that's embedded in a long consecutive run of matching SNPs. The problem with the Affy chip was that the error rate was somewhat higher, and FTDNA had to tolerate too many mismatches, giving rise to more false positives. Ann On Fri, Jan 6, 2012 at 3:25 PM, Jim Bartlett <[1][email protected]>wrote: > Well I think I understand what Ann, Dwight and Larry are trying to pour > into my head. I guess an analogy of two superimposed pictures (xrays) works > for me - and then a match would come up with any combination. But I thought > that was the whole point of insisting on long segments. What I'm > understanding is that the matching algorithm can scoot along and match with > either one of two at each point along the whole string. I can see where > this would be kind of murky in the short haul, but over a long segment, it > still seems pretty amazing to get matches with folks that then don't match > each other. I guess I could mark it up to statistics - like flipping heads > 10 times in a row, or something. > > I had thought the explanation would be more like an anomoly, or lack of > preciseness, in the read of the data... In other words, if I got 2 FFI > tests would they be identical? I compared my FFA segments to the FFI > segments and many weren't even close. So I came to the conclusion that the > "picture" the FFI test takes is a little fuzzy, and not a real precise > observation. > > But it appears to be roughly right - as a genealogist, I'm still working > with my matches and finding Common Ancestors. I'm not sure if we do in fact > share the large atDNA segment with each other, but I am sure that we've > come to the same conclusions on the paper trails. I'm hopeful that as more > and more folks take this test, and maybe as the 23&me results are added in, > we will begin to build a body of info that will help us sort it all out. > > Thanks again for 'splaining this to an engineer;>j > Jim Bartlett > ______________________________ For answers to Frequently Asked Questions about mailing lists, please see: [2]http://dgmweb.net/MailingListFAQs.html ------------------------------- To unsubscribe from the list, please send an email to A[3][email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message References 1. mailto:[email protected] 2. http://dgmweb.net/MailingListFAQs.html 3. mailto:[email protected]
Jim, perhaps of interest to you; I have found several (5) previously unknown relatives who descend from ONE of my sets of 3rd GGGgrandparents, and another descending from that 3GGGgrandfather's father. These matches are at both FTDNA and 23andMe, and range from 4th cousin to 6th cousin. It amazes me that such a group of distant relatives, unknown to each other, have had Adna testing to reveal our source in this one line. Mary Alice ________________________________________ PeoplePC Online A better way to Internet http://www.peoplepc.com
Larry Thanks for the link - my "Cliff Notes" version is 1cM is about 1 million ACTGs, give or take - in other words a lot more than I'm going to do by hand. Looking at an AG and comparing it to AA, or AG or GG, etc would drive me crazy working on the first position...;>j Clearly the example of one is by hand, but in practice it's done by software. And thus both matches should submit their raw data to the same program. I just don't see many genealogists doing this... Jim Bartlett On 01/07/12, Larry Vick<[email protected]> wrote: Jim, Perhaps the link below to Wikipedia will help with your question. [1]http://en.wikipedia.org/wiki/Centimorgan I wasn't able to quickly find two matches I have at 7.7 cM, but I found two matches at 8.03 cM. Hopefully, these matches will make the point. One is on chr 20 and the other is on chr 2. The position numbers are shown, followed by the number of cM and the number of SNPs in the segment. Finally, the number of bases is shown. You can see that the number of bases on chr 2 is twice the number on chr 20 even though the cM is the same. The number of SNPs per million bases drives the cM calculation. So the chr 20 segment has less diversity than the chr 2 segment (which is what the cM captures). 20 38171398 43337995 8.03 1500 5,166,597 2 56186100 66550469 8.03 2400 10,364,369 Regards, Larry _________________________________________________________________ From: Jim Bartlett <[2][email protected]> To: Larry Vick <[3][email protected]>; "[4][email protected]" <[5][email protected]> Sent: Friday, January 6, 2012 10:00 PM Subject: Re: [AUTOSOMAL-DNA] Subject: How do you work a 5 way FF match with few clues ? Larry Your explanation is excellent, and the light bulb came on - thanks for your brevity and clarity. Now - that explains one location. How many ACTGs are there in 7.7cM? I thought the point was that with a long segment we could rely on the fact that it was an exact match. I guess the point is, in a way, it is an exact match. So what's the probability of another segment of exactly the same long length being an exact match, just with a scrambled up/different arrangement of ACTGs? If in fact groups of 3, 4 or 5 matches turn out to not be related, then the odds must be high. Although I understand how the matching is done, and how it can create many different combinations from the same segment, it still seems ... unusual? ... that these will match several others from the relatively small community of folks who have taken this test. Jim - Sent from my iPhone - FaceTime! On Jan 6, 2012, at 5:14 PM, Larry Vick <[6][email protected]> wrote: > Jim, > > If you have AG at a location and you have two matches one of whom has AA and the other has GG, you will match both, but they will not match each other. FF has no way to know which of your parents you inherited the A from and which you inherited the G from. > > Regards, > > Larry References 1. http://en.wikipedia.org/wiki/Centimorgan 2. mailto:[email protected] 3. mailto:[email protected] 4. mailto:[email protected] 5. mailto:[email protected] 6. mailto:[email protected]
Dear Linda, I think that Ann Turner already responded to most of your questions earlier this morning so I won't address all of them now, but let me know if there are still questions in your message below after you have read Ann's and my messages. It is unfortunate that your child isn't willing to do a 23andMe or Family Finder test as those results would be very helpful for you. My suggestion is that you continue to gently encourage them to be tested and I would offer to pay for the test yourself. Assuming that your child never does any autosomal DNA testing, then your next best approach would be to do a 23andMe and/or Family Finder test on any siblings that you have as well as at least one child from each of those siblings. If feasible, I would also test the spouses of your siblings if those siblings have children who are being DNA tested. This would allow you to phase the data for a two parent/one child trio. If you have no siblings, then the next best thing to do would be to test any first cousins you have and at least one child from those first cousins. I think that it would be wise for you and anyone else who hasn't done any chromosome mapping up to this point in time to download my spreadsheet at http://dl.dropbox.com/u/21841126/phased%20genome%20of%20Robert%20and%20Betty %20Janzen.zip that shows the mapping I have done for my mom's DNA data up to this point in time. The SNP data in the spreadsheet was phased using my program at http://dl.dropbox.com/u/21841126/phasing%20program%20%28small%20version%29.x ls. (See instructions at http://dl.dropbox.com/u/21841126/phasing%20program%20instructions.rtf.) Some additional explanatory information is at the bottom of the spreadsheet. I haven't had time yet to map my dad's DNA from his cousins who have been tested, but I plan to do that in the near future. As you review this spreadsheet, it would probably be helpful if you refer to my mom's pedigree chart at http://wc.rootsweb.ancestry.com/cgi-bin/igm.cgi?op=PED&db=janzen&id=I8 for clarification. In my opinion, everyone should be doing chromosome mapping similar to what I am doing. My spreadsheet currently only has the SNPs from 23andMe's version 2 in it, but I am planning to update this to include all of the SNPs from 23andMe's version 3 chip in the near future as well. For those of you who haven't created a spreadsheet such as this yet, I would suggest that you include the SNPs from 23andMe's version 3 chip if you have been tested with that chip or with the SNPs on FTDNA's Family Finder test if you have done that test. You need Excel 2007 or a later version to create a spreadsheet like this. I would also like to make a few other comments about the spreadsheet. If you review it carefully you will note that are some segments that I haven't yet mapped to a specific ancestor and I have simply added a comment such as "contiguous segment per James Cole" on row 512455 and adjacent rows. This means that James Cole has a long matching segment (over 10 cMs or so) with my mom and I thus know that this segment came from a matching ancestor, but I haven't yet been able to determine the ancestor yet. In any case, knowing the boundaries of the contiguous segments such as these is helpful because it means that there hasn't been a crossover in that particular DNA segment going back at least as far as whenever my mom and James Cole shared a common ancestor. Sincerely, Tim Janzen -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Linda Sent: Saturday, January 07, 2012 5:34 AM To: [email protected] Subject: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues ? As Jim said, I believe I am beginning to understand how this works. Thanks Tim for your very comprehensible explanations and to everyone for the input. For me, just now, I take away from this thread: 1. My GEDMATCH.com results are results that are true IBD and would be the best choice for working on in the immediate and foreseeable future. Am I understanding this correctly? 2. So, my initial query would be resolved by requesting those other four matches to upload their results to GEDMATCH.com. Is this correct? 3. If there was still a match with any of the four it would remain to determine all other info such as from mother or father DNA and to look for surnames in common. Neither my husband nor myself have living parents and only one living child who is not willing to do any DNA testing at all at this time. So, unless new technology is found for working with atDNA am I correct in thinking my best results would always be from a program such as GEDMATCH? I am in constant awe of the folks who post here and freely share so much knowledge. Thanks to all. Linda
All - sorry about my earlier - pre coffee - down-in-the-dumps email. I just checked my 172 FF matches and 92 of them have a segment over 10cM - so that's still a good ROI (Return On Investment) for me to work on - many genealogists would pay for a "guaranteed" list of 100 new cousins with DNA links to themselves. Of the 92, 13 of them and I have confirmed our links on our FTDNA page: one Close, one 3C, one 4C, five 6C, and five Distant Cousins. So if they are all probably IBD, my IBD ROI is about 22 bucks., and this will drop as more folks take this test and more cousins are found! As a genealogist, would I trade a sawbuck, for a guaranteed DNA link to a new cousin? You betcha! And since I can be pretty confident at the 10cM level of an IBD segment, I can see where several hundred such matches (in the not too distant furture) would allow us genealogists to do some phasing with those segments. In general, each such IBD segment could be tagged as being from a particular line. Right? I have actually found over 20 other Common Ancestors among my FF matches. So even if they are IBS, finding others who are willing to compare ancestries has been worthwhile. Perhaps as my "inventory" of IBD segments grows (and is tagged to particular lines), I'll find some of these smaller segments match the IBD segments. If the genealogy works out, they can then be moved to the IBD column. Jim Bartlett On 01/07/12, Jim Bartlett<[email protected]> wrote: Dear Tim, Thank you for outlining this process. I have very mixed emotions at this point. The main ones are betrayed and upset. I have been pushing (as in selling) Family Finder for the last 18 months - I make presentations at the FHCs in DC and Baltimore at annual workshops, as well as retirement communities and genealogy clubs in the region. I'm asked to speak, because DNA is the most technical word I use - I proclaim that no biology is needed to use the new DNA tools for genealogy, and I keep the presentation and discussions at that level. I have a Masters in engineering and my wife has a PhD in biology - but I try to keep the talks at a level everyone can understand and use. Many have taken my advice: "every serious genealogist should take the Family Finder DNA test." Many of them can barely afford this test, much less be required to fund 3, or more, in order to use it. It now appears this test is only for a very small group of folks who truly understand it, and it's not ready for the vast majority of genealogists. We've had good success with Y-DNA surname projects: "if two men have matching Y-DNA, they have a common ancestor; if not, they don't" - an easy rule that we can understand. The rules for atDNA are like Twister, very expensive, and involve many more steps than just comparing with matches. I'm sad that I've sent so many unsuspecting genealogists down this path. If I understand correctly, the simple rule for genealogy hobbyists is: "discard all matches below 10cM, and focus on the few remaining". Later today I'll see what that does in my case. In my 1024 23&me matches, what should be the equivalent (to 10cM) cutoff - in percent and/or number of segments? Is ANYONE finding any new cousins with FF or 23&me? By this I mean strangers, not the close kin you already know and/or have paid for their tests. What percent of your hitherto unknown matches have worked out? Jim - Sent from my iPhone - FaceTime! On Jan 7, 2012, at 2:09 AM, "Tim Janzen" <[1][email protected]> wrote: > Dear Jim, > I agree that it pays to concentrate on your matches at either FF or > 23andMe that are over 10 cMs. If you chose to contact people who are > matching you at less than 8 cM or so you need to keep in mind that a > significant percentage of these matches will be IBS unless one of your > children also matches them on the same segment, in which case they will be > IBD. > Ideally, you would like to phase your data and then compare your > phased data to that of your matches. The only way you can do that is to > test two parent/one child trios and then use a phasing program like the one > I wrote or David Pike's to compare your phased data with that of your > matches. In order to do this, your matches would need to share their raw > data file with you. I suspect that many people won't be willing to do that. > Ideally everyone would be downloading their raw data files and uploading > them GEDmatch where you could do comparisons using phased data. So far > neither 23andMe or FTDNA have made a move to phase their data. I talked to > Bennett Greenspan about doing this for Family Finder in November, but he was > reluctant to spend the money on the programming needed to phase the data for > the two parent/one child trios that he currently has in FF. > As I have mentioned before, we all need to be mapping our > chromosomes so we know as best as possible which ancestral line each DNA > segment came from. In your particular case, you have quite a few matches > who could be distant cousins (6th-10th cousins). What you need to do is > test as many 2nd and 3rd cousins as feasible from the lines you think you > match your other distant cousins on to see if the DNA segments you share > with those distant cousins came through the ancestral lines you think they > could have come from. Using phased data for doing the comparisons would > also be of big help to you if you could get your matches to share their raw > data file with you. You could then quickly tell which of your matches are > simply IBS. > Sincerely, > Tim > > -----Original Message----- > From: a[2][email protected] > [[3]mailto:[email protected]] On Behalf Of Jim Bartlett > Sent: Friday, January 06, 2012 10:20 PM > To: [4][email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues ? > > Tim, > > Walden's table looks very similar to the percentages that FTDNA uses to > indicate what percentage of your cousins will show up as a match with Family > Finder... Are these the same? This would mean most of your cousins would be > listed, but a percent would be IBS, and so not atDNA cousins. I don't think > this is the case because I have a number of known cousins, with atDNA tests, > who do not match me (FF or 23) > > So the other option is that FTDNA's percent eliminates some of your cousins > from appearing on the match list, and Walden's table eliminates a percentage > of these from being atDNA cousins. This is really whittling down the true > matches. And worse yet, the unphased don't know which are which. > > If I do understand this, we can't rely on the list from, say, FTDNA. We need > to manually?, or with some software, determine if the A in AG is from Mom or > Dad, and do this for our whole raw data (how many ACTGs?). And then how do > we compare this with the long segments of our matches? Is this part of the > FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. > > Now I'm just trying to see how I get this info, and then how I use it with > my list of matches to capture the ones who are IBD. I guess one way is to > cull out everyone below 10cM. Now I'm catching on. But this will eliminate > most of my matches... It also covers a number of folks with whom I've > already found common ancestry - the point here being that we might be > cousins, but we probably don't share a large IBD segment... > > Jim - > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > [5]http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to A[6][email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message ______________________________ For answers to Frequently Asked Questions about mailing lists, please see: [7]http://dgmweb.net/MailingListFAQs.html ------------------------------- To unsubscribe from the list, please send an email to A[8][email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message References 1. mailto:[email protected] 2. mailto:[email protected] 3. mailto:[email protected] 4. mailto:[email protected] 5. http://dgmweb.net/MailingListFAQs.html 6. mailto:[email protected] 7. http://dgmweb.net/MailingListFAQs.html 8. mailto:[email protected]
Dear Tim, Thank you for outlining this process. I have very mixed emotions at this point. The main ones are betrayed and upset. I have been pushing (as in selling) Family Finder for the last 18 months - I make presentations at the FHCs in DC and Baltimore at annual workshops, as well as retirement communities and genealogy clubs in the region. I'm asked to speak, because DNA is the most technical word I use - I proclaim that no biology is needed to use the new DNA tools for genealogy, and I keep the presentation and discussions at that level. I have a Masters in engineering and my wife has a PhD in biology - but I try to keep the talks at a level everyone can understand and use. Many have taken my advice: "every serious genealogist should take the Family Finder DNA test." Many of them can barely afford this test, much less be required to fund 3, or more, in order to use it. It now appears this test is only for a very small group of folks who truly understand it, and it's not ready for the vast majority of genealogists. We've had good success with Y-DNA surname projects: "if two men have matching Y-DNA, they have a common ancestor; if not, they don't" - an easy rule that we can understand. The rules for atDNA are like Twister, very expensive, and involve many more steps than just comparing with matches. I'm sad that I've sent so many unsuspecting genealogists down this path. If I understand correctly, the simple rule for genealogy hobbyists is: "discard all matches below 10cM, and focus on the few remaining". Later today I'll see what that does in my case. In my 1024 23&me matches, what should be the equivalent (to 10cM) cutoff - in percent and/or number of segments? Is ANYONE finding any new cousins with FF or 23&me? By this I mean strangers, not the close kin you already know and/or have paid for their tests. What percent of your hitherto unknown matches have worked out? Jim - Sent from my iPhone - FaceTime! On Jan 7, 2012, at 2:09 AM, "Tim Janzen" <[email protected]> wrote: > Dear Jim, > I agree that it pays to concentrate on your matches at either FF or > 23andMe that are over 10 cMs. If you chose to contact people who are > matching you at less than 8 cM or so you need to keep in mind that a > significant percentage of these matches will be IBS unless one of your > children also matches them on the same segment, in which case they will be > IBD. > Ideally, you would like to phase your data and then compare your > phased data to that of your matches. The only way you can do that is to > test two parent/one child trios and then use a phasing program like the one > I wrote or David Pike's to compare your phased data with that of your > matches. In order to do this, your matches would need to share their raw > data file with you. I suspect that many people won't be willing to do that. > Ideally everyone would be downloading their raw data files and uploading > them GEDmatch where you could do comparisons using phased data. So far > neither 23andMe or FTDNA have made a move to phase their data. I talked to > Bennett Greenspan about doing this for Family Finder in November, but he was > reluctant to spend the money on the programming needed to phase the data for > the two parent/one child trios that he currently has in FF. > As I have mentioned before, we all need to be mapping our > chromosomes so we know as best as possible which ancestral line each DNA > segment came from. In your particular case, you have quite a few matches > who could be distant cousins (6th-10th cousins). What you need to do is > test as many 2nd and 3rd cousins as feasible from the lines you think you > match your other distant cousins on to see if the DNA segments you share > with those distant cousins came through the ancestral lines you think they > could have come from. Using phased data for doing the comparisons would > also be of big help to you if you could get your matches to share their raw > data file with you. You could then quickly tell which of your matches are > simply IBS. > Sincerely, > Tim > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Jim Bartlett > Sent: Friday, January 06, 2012 10:20 PM > To: [email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues ? > > Tim, > > Walden's table looks very similar to the percentages that FTDNA uses to > indicate what percentage of your cousins will show up as a match with Family > Finder... Are these the same? This would mean most of your cousins would be > listed, but a percent would be IBS, and so not atDNA cousins. I don't think > this is the case because I have a number of known cousins, with atDNA tests, > who do not match me (FF or 23) > > So the other option is that FTDNA's percent eliminates some of your cousins > from appearing on the match list, and Walden's table eliminates a percentage > of these from being atDNA cousins. This is really whittling down the true > matches. And worse yet, the unphased don't know which are which. > > If I do understand this, we can't rely on the list from, say, FTDNA. We need > to manually?, or with some software, determine if the A in AG is from Mom or > Dad, and do this for our whole raw data (how many ACTGs?). And then how do > we compare this with the long segments of our matches? Is this part of the > FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. > > Now I'm just trying to see how I get this info, and then how I use it with > my list of matches to capture the ones who are IBD. I guess one way is to > cull out everyone below 10cM. Now I'm catching on. But this will eliminate > most of my matches... It also covers a number of folks with whom I've > already found common ancestry - the point here being that we might be > cousins, but we probably don't share a large IBD segment... > > Jim - > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message
Dear Ann, Can you give us some data about the lengths of the matching segments that you referred to for the 5% of your son's matches at 23andMe who don't match your husband or you and the 10-20% of your son's matches in FTDNA's Family Finder who don't match your husband or you? Sincerely, Tim Janzen -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Ann Turner Sent: Saturday, January 07, 2012 8:26 AM To: Larry Vick; [email protected] Subject: Re: [AUTOSOMAL-DNA] Subject: How do you work a 5 way FF match with few clues ? On the flip side, a match in the child should also be found in one of the parents. If not, it may be IBS. However, I just recently learned something that might affect this with FTDNA's algorithm. At 23andMe, I've found that about 5% of my son's matches don't show up for my husband and me. For father/mother/child trios at FTDNA, the percentage seems to run higher, 10-20%. FTDNA seems to be incorporating some additional parameters beyond the length of the longest segment. Ann
Tim's stats were an interesting approach (and I need to think about them more), but I don't want you to think that a negative finding is necessarily significant. There is only a 50% chance that a given segment in the parent will show up in the child, so Tim is looking at how often his percentages deviate from that number. On the flip side, a match in the child should also be found in one of the parents. If not, it may be IBS. However, I just recently learned something that might affect this with FTDNA's algorithm. At 23andMe, I've found that about 5% of my son's matches don't show up for my husband and me. For father/mother/child trios at FTDNA, the percentage seems to run higher, 10-20%. FTDNA seems to be incorporating some additional parameters beyond the length of the longest segment. In one case, a man had a match in FF with segments of 2.1,1.2, 2.0, 1.1 and *23.2 *cM (GEDmatch with 1 cM threshold). The man's mother was not declared a match in FF, but GEDmatch showed there were matching segments of 1.7and *23.2 *cM. That seems odd to me, and might account for some of the cases where a match in the child is not found in either parent. Ann On Sat, Jan 7, 2012 at 7:41 AM, Larry Vick <[email protected]> wrote: > Ann, > > Thanks for the correction. I have tried to learn from your postings and > those of others like Tim JANZEN, but I still have more to learn. I am > still thinking about Tim's revelation (at least for me) that there is more > significance to have a child share a match with me than I had previously > thought. > > If only I had appreciated my high school biology class then the way I > would today I might have done better than the "D" I got. I only got a "D" > because I was allowed to do extra credit. > > Regards, > > Larry >
Jim, The morning coffee jump start does help. I think that your ROI point is excellent as were your other points. The big issue is that autosomal DNA is more difficult to understand and work with than yDNA. Even worse, except for GEDmatch, Dr Pike's site and a few others there is no really useful software yet. Half million to million line spreadsheets really stretch what a spreadsheet can do. I expect that SQL is the ultimate answer. Still, for me, this journey started in NOV 2009 with the 23andMe beta. Looking back over that timeframe, the progress is astonishing. In another two to three years . . . Sam
I'm not sure how John Walden derived his stats, but they seem overly pessimistic to me. This is based on 23andMe's algorithm, which used simulated pedigrees with real genotype data. They created artificial descendancies, mating random genotypes and creating descendants according to recombination probabilities. This was done for 10 generations. Then they looked at matching segments to see if they could be traced back to the founders. Perhaps 5-10% of matches at the 7 cM failed this test, so it happens, but it's not nearly as gloomy as John Walden depicted. The 10 cM level at 23andMe would be about 0.13% and classified as a 3rd to distant cousin. What many people fail to consider is that most cousins will be on the distant end of the range, simply because you have many more of them. Ann On Sat, Jan 7, 2012 at 6:29 AM, Jim Bartlett <[email protected]>wrote: > Dear Tim, > > Thank you for outlining this process. I have very mixed emotions at this > point. The main ones are betrayed and upset. I have been pushing (as in > selling) Family Finder for the last 18 months - I make presentations at the > FHCs in DC and Baltimore at annual workshops, as well as retirement > communities and genealogy clubs in the region. I'm asked to speak, because > DNA is the most technical word I use - I proclaim that no biology is needed > to use the new DNA tools for genealogy, and I keep the presentation and > discussions at that level. I have a Masters in engineering and my wife has > a PhD in biology - but I try to keep the talks at a level everyone can > understand and use. Many have taken my advice: "every serious genealogist > should take the Family Finder DNA test." > > Many of them can barely afford this test, much less be required to fund 3, > or more, in order to use it. It now appears this test is only for a very > small group of folks who truly understand it, and it's not ready for the > vast majority of genealogists. > > We've had good success with Y-DNA surname projects: "if two men have > matching Y-DNA, they have a common ancestor; if not, they don't" - an easy > rule that we can understand. > > The rules for atDNA are like Twister, very expensive, and involve many > more steps than just comparing with matches. I'm sad that I've sent so many > unsuspecting genealogists down this path. > > If I understand correctly, the simple rule for genealogy hobbyists is: > "discard all matches below 10cM, and focus on the few remaining". Later > today I'll see what that does in my case. In my 1024 23&me matches, what > should be the equivalent (to 10cM) cutoff - in percent and/or number of > segments? > > Is ANYONE finding any new cousins with FF or 23&me? By this I mean > strangers, not the close kin you already know and/or have paid for their > tests. What percent of your hitherto unknown matches have worked out? >
Ann, Thanks for the correction. I have tried to learn from your postings and those of others like Tim JANZEN, but I still have more to learn. I am still thinking about Tim's revelation (at least for me) that there is more significance to have a child share a match with me than I had previously thought. If only I had appreciated my high school biology class then the way I would today I might have done better than the "D" I got. I only got a "D" because I was allowed to do extra credit. Regards, Larry ________________________________ From: Ann Turner <[email protected]> To: [email protected] Sent: Saturday, January 7, 2012 10:16 AM Subject: Re: [AUTOSOMAL-DNA] Subject: How do you work a 5 way FF match with few clues ? One correction -- the number of SNPs is not driving the cM calculation. The cM unit is empirically derived by looking at cross-over points in a large dataset of father/mother/child/extended families. It's not a fixed quantity -- it varies somewhat depending on the dataset (e.g. Iceland vs CEU), and there is a big difference between males and females (so sex-averaged values are used). It also varies over the genome, as your example of a 5 and 10 Mb segment shows. Ann Turner
Jim, Yes, I have found previously unknown cousins at 23andMe (and then a couple of them tested at FTDNA, and we also matched there). The number I have found is very small. Thanks to CNN I even got to meet one (Bob STUBBS) in person. If you just count those where my match and I have identified a common ancestor or ancestral couple it has just been seven in my father's line (he is deceased so I can't test him) and 12 in my mother's line (fortunately she has been tested). I only counted one match per family (e.g. where I might match multiple members of the same family - say a parent, child, and sibling). I have been able to identify five cases where my wife and one of her matches have a shared ancestor or ancestral couple. Unfortunately, my wife isn't interested in genealogy, and she has some early brick walls I need to work on. My wife's parents are deceased, so I can't test them and she has no siblings. I am starting to find matches with more than one person on the same segment where we all can trace ourselves to a shared ancestor or ancestral couple. I need more matches to increase my confidence that the shared ancestor or ancestral couple we have identified is the one we inherited the segment from. I also have several cases where I have identified a shared ancestor or ancestral couple with one match but other matches we (the first match, and my mother or me) have on the same segment don't have a deep enough pedigree to know if they are also descendants of the ancestor or ancestral couple. In some cases we have matching surnames in our pedigrees but we just can't identify the shared ancestor. Hopefully, they will take the information I share with them and do some digging and come back to me later with what they find. Perhaps the most valuable information for me was finding an African segment in both my mother's and my wife's 23andMe Ancestry Paintings (different segments so they don't match on them). Finding my mother's African segment helped me to understand a story my maternal grandmother's 1st cousin had told me about where one of my maternal grandmother's lines had come from (Newman's Ridge in Hancock Co., TN) and the fact that the area was home to many Melungeon families. At the time the cousin told the story I could tell he expected me to understand the significance of Newman's Ridge, but I had no idea what he meant. While my mother and my wife don't match, I have noticed a pattern where my wife matches a person on a segment on one chromosome and my mother matches the same person on a different segment on another chromosome. In about half of the cases the matches have ancestry paintings with at least one African segment. I think this will lead me to finding my wife and my mother have a distant shared Melungeon ancestor. Well, I could go on and one, but I hope I have answered your question. Regards, Larry ________________________________ From: Jim Bartlett <[email protected]> To: "[email protected]" <[email protected]> Sent: Saturday, January 7, 2012 9:29 AM Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues ? Is ANYONE finding any new cousins with FF or 23&me? By this I mean strangers, not the close kin you already know and/or have paid for their tests. What percent of your hitherto unknown matches have worked out? Jim - Sent from my iPhone - FaceTime!
As Jim said, I believe I am beginning to understand how this works. Thanks Tim for your very comprehensible explanations and to everyone for the input. For me, just now, I take away from this thread: 1. My GEDMATCH.com results are results that are true IBD and would be the best choice for working on in the immediate and foreseeable future. Am I understanding this correctly? 2. So, my initial query would be resolved by requesting those other four matches to upload their results to GEDMATCH.com. Is this correct? 3. If there was still a match with any of the four it would remain to determine all other info such as from mother or father DNA and to look for surnames in common. Neither my husband nor myself have living parents and only one living child who is not willing to do any DNA testing at all at this time. So, unless new technology is found for working with atDNA am I correct in thinking my best results would always be from a program such as GEDMATCH? I am in constant awe of the folks who post here and freely share so much knowledge. Thanks to all. Linda
If you got 2 FFI tests, the raw data would "only" be 99.99% identical. There will be a few genotyping errors, but FTDNA (and 23andMe) will tolerate a single mismatch that's embedded in a long consecutive run of matching SNPs. The problem with the Affy chip was that the error rate was somewhat higher, and FTDNA had to tolerate too many mismatches, giving rise to more false positives. Ann On Fri, Jan 6, 2012 at 3:25 PM, Jim Bartlett <[email protected]>wrote: > Well I think I understand what Ann, Dwight and Larry are trying to pour > into my head. I guess an analogy of two superimposed pictures (xrays) works > for me - and then a match would come up with any combination. But I thought > that was the whole point of insisting on long segments. What I'm > understanding is that the matching algorithm can scoot along and match with > either one of two at each point along the whole string. I can see where > this would be kind of murky in the short haul, but over a long segment, it > still seems pretty amazing to get matches with folks that then don't match > each other. I guess I could mark it up to statistics - like flipping heads > 10 times in a row, or something. > > I had thought the explanation would be more like an anomoly, or lack of > preciseness, in the read of the data... In other words, if I got 2 FFI > tests would they be identical? I compared my FFA segments to the FFI > segments and many weren't even close. So I came to the conclusion that the > "picture" the FFI test takes is a little fuzzy, and not a real precise > observation. > > But it appears to be roughly right - as a genealogist, I'm still working > with my matches and finding Common Ancestors. I'm not sure if we do in fact > share the large atDNA segment with each other, but I am sure that we've > come to the same conclusions on the paper trails. I'm hopeful that as more > and more folks take this test, and maybe as the 23&me results are added in, > we will begin to build a body of info that will help us sort it all out. > > Thanks again for 'splaining this to an engineer;>j > Jim Bartlett >
Not quite. I'd phrase it differently. For #1, you need to demonstrate that the segments are true IBD from a single ancestor, and one way to filter out IBS segments is to do #2, then compare every possible pair out of the set of five A matches B,C,D,E B matches C,D,E C matches D,E D matches E Ann On Sat, Jan 7, 2012 at 5:33 AM, Linda <[email protected]> wrote: > As Jim said, I believe I am beginning to understand how this works. > > Thanks Tim for your very comprehensible explanations and to everyone for > the input. > > For me, just now, I take away from this thread: > > 1. My GEDMATCH.com results are results that are true IBD and would be > the best choice for working on in the immediate and foreseeable > future. Am I understanding this correctly? > > 2. So, my initial query would be resolved by requesting those other > four matches to upload their results to GEDMATCH.com. Is this correct? > > 3. If there was still a match with any of the four it would remain to > determine all other info such as from mother or father DNA and to look > for surnames in common. > > Neither my husband nor myself have living parents and only one living > child who is not willing to do any DNA testing at all at this time. So, > unless new technology is found for working with atDNA am I correct in > thinking my best results would always be from a program such as GEDMATCH? > > I am in constant awe of the folks who post here and freely share so much > knowledge. Thanks to all. Linda > > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to > [email protected] with the word 'unsubscribe' without > the quotes in the subject and the body of the message >
One correction -- the number of SNPs is not driving the cM calculation. The cM unit is empirically derived by looking at cross-over points in a large dataset of father/mother/child/extended families. It's not a fixed quantity -- it varies somewhat depending on the dataset (e.g. Iceland vs CEU), and there is a big difference between males and females (so sex-averaged values are used). It also varies over the genome, as your example of a 5 and 10 Mb segment shows. Ann Turner On Sat, Jan 7, 2012 at 5:49 AM, Larry Vick <[email protected]> wrote: > Jim, > > Perhaps the link below to Wikipedia will help with your question. > > http://en.wikipedia.org/wiki/Centimorgan > > > I wasn't able to quickly find two matches I have at 7.7 cM, but I found > two matches at 8.03 cM. Hopefully, these matches will make the point. One > is on chr 20 and the other is on chr 2. The position numbers are shown, > followed by the number of cM and the number of SNPs in the segment. > Finally, the number of bases is shown. You can see that the number of > bases on chr 2 is twice the number on chr 20 even though the cM is the > same. The number of SNPs per million bases drives the cM calculation. So > the chr 20 segment has less diversity than the chr 2 segment (which is what > the cM captures). > > 20 38171398 43337995 8.03 1500 5,166,597 > 2 56186100 66550469 8.03 2400 10,364,369 > > Regards, > > Larry >
Jim, Perhaps the link below to Wikipedia will help with your question. http://en.wikipedia.org/wiki/Centimorgan I wasn't able to quickly find two matches I have at 7.7 cM, but I found two matches at 8.03 cM. Hopefully, these matches will make the point. One is on chr 20 and the other is on chr 2. The position numbers are shown, followed by the number of cM and the number of SNPs in the segment. Finally, the number of bases is shown. You can see that the number of bases on chr 2 is twice the number on chr 20 even though the cM is the same. The number of SNPs per million bases drives the cM calculation. So the chr 20 segment has less diversity than the chr 2 segment (which is what the cM captures). 20 38171398 43337995 8.03 1500 5,166,597 2 56186100 66550469 8.03 2400 10,364,369 Regards, Larry ________________________________ From: Jim Bartlett <[email protected]> To: Larry Vick <[email protected]>; "[email protected]" <[email protected]> Sent: Friday, January 6, 2012 10:00 PM Subject: Re: [AUTOSOMAL-DNA] Subject: How do you work a 5 way FF match with few clues ? Larry Your explanation is excellent, and the light bulb came on - thanks for your brevity and clarity. Now - that explains one location. How many ACTGs are there in 7.7cM? I thought the point was that with a long segment we could rely on the fact that it was an exact match. I guess the point is, in a way, it is an exact match. So what's the probability of another segment of exactly the same long length being an exact match, just with a scrambled up/different arrangement of ACTGs? If in fact groups of 3, 4 or 5 matches turn out to not be related, then the odds must be high. Although I understand how the matching is done, and how it can create many different combinations from the same segment, it still seems ... unusual? ... that these will match several others from the relatively small community of folks who have taken this test. Jim - Sent from my iPhone - FaceTime! On Jan 6, 2012, at 5:14 PM, Larry Vick <[email protected]> wrote: > Jim, > > If you have AG at a location and you have two matches one of whom has AA and the other has GG, you will match both, but they will not match each other. FF has no way to know which of your parents you inherited the A from and which you inherited the G from. > > Regards, > > Larry
Tim, if I were to test my son and his mother, we would be able to phase our data. Would that then allow me to better evaluate my own matches as well as my sons? I've always basically skipped over discussions of phasing because, since my parents are both deceased, I just figured this was something that didn't apply to me. But perhaps it would be allow for partial understanding of my own atDNA by doing as I suggested above?? So any segments that my son has that came from me and pass the phasing test would therefore also be proven IBD segments for me as well. So I guess roughly 1/2 of my atDNA would be assessable by this method? On Sat, Jan 7, 2012 at 2:09 AM, Tim Janzen <[email protected]> wrote: > Dear Jim, > I agree that it pays to concentrate on your matches at either FF or > 23andMe that are over 10 cMs. If you chose to contact people who are > matching you at less than 8 cM or so you need to keep in mind that a > significant percentage of these matches will be IBS unless one of your > children also matches them on the same segment, in which case they will be > IBD. > Ideally, you would like to phase your data and then compare your > phased data to that of your matches. The only way you can do that is to > test two parent/one child trios and then use a phasing program like the one > I wrote or David Pike's to compare your phased data with that of your > matches. In order to do this, your matches would need to share their raw > data file with you. I suspect that many people won't be willing to do > that. > Ideally everyone would be downloading their raw data files and uploading > them GEDmatch where you could do comparisons using phased data. So far > neither 23andMe or FTDNA have made a move to phase their data. I talked to > Bennett Greenspan about doing this for Family Finder in November, but he > was > reluctant to spend the money on the programming needed to phase the data > for > the two parent/one child trios that he currently has in FF. > As I have mentioned before, we all need to be mapping our > chromosomes so we know as best as possible which ancestral line each DNA > segment came from. In your particular case, you have quite a few matches > who could be distant cousins (6th-10th cousins). What you need to do is > test as many 2nd and 3rd cousins as feasible from the lines you think you > match your other distant cousins on to see if the DNA segments you share > with those distant cousins came through the ancestral lines you think they > could have come from. Using phased data for doing the comparisons would > also be of big help to you if you could get your matches to share their raw > data file with you. You could then quickly tell which of your matches are > simply IBS. > Sincerely, > Tim > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Jim Bartlett > Sent: Friday, January 06, 2012 10:20 PM > To: [email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues ? > > Tim, > > Walden's table looks very similar to the percentages that FTDNA uses to > indicate what percentage of your cousins will show up as a match with > Family > Finder... Are these the same? This would mean most of your cousins would be > listed, but a percent would be IBS, and so not atDNA cousins. I don't think > this is the case because I have a number of known cousins, with atDNA > tests, > who do not match me (FF or 23) > > So the other option is that FTDNA's percent eliminates some of your cousins > from appearing on the match list, and Walden's table eliminates a > percentage > of these from being atDNA cousins. This is really whittling down the true > matches. And worse yet, the unphased don't know which are which. > > If I do understand this, we can't rely on the list from, say, FTDNA. We > need > to manually?, or with some software, determine if the A in AG is from Mom > or > Dad, and do this for our whole raw data (how many ACTGs?). And then how do > we compare this with the long segments of our matches? Is this part of the > FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. > > Now I'm just trying to see how I get this info, and then how I use it with > my list of matches to capture the ones who are IBD. I guess one way is to > cull out everyone below 10cM. Now I'm catching on. But this will eliminate > most of my matches... It also covers a number of folks with whom I've > already found common ancestry - the point here being that we might be > cousins, but we probably don't share a large IBD segment... > > Jim - > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to > [email protected] with the word 'unsubscribe' without > the quotes in the subject and the body of the message >
Tim, Thanks for the explanation. I'm beginning to catch on. So if a genealogist just takes the Family Finder test, and knows nothing about phasing, they should keep Walden's table handy and expect that half of their matches around 8cM won't really be cousins (at least with a shared IBD segment) Walden's table looks very similar to the percentages that FTDNA uses to indicate what percentage of your cousins will show up as a match with Family Finder... Are these the same? This would mean most of your cousins would be listed, but a percent would be IBS, and so not atDNA cousins. I don't think this is the case because I have a number of known cousins, with atDNA tests, who do not match me (FF or 23) So the other option is that FTDNA's percent eliminates some of your cousins from appearing on the match list, and Walden's table eliminates a percentage of these from being atDNA cousins. This is really whittling down the true matches. And worse yet, the unphased don't know which are which. If I do understand this, we can't rely on the list from, say, FTDNA. We need to manually?, or with some software, determine if the A in AG is from Mom or Dad, and do this for our whole raw data (how many ACTGs?). And then how do we compare this with the long segments of our matches? Is this part of the FTDNA and 23&me programs I haven't found yet? Or do I need 3rd party help. Now I'm just trying to see how I get this info, and then how I use it with my list of matches to capture the ones who are IBD. I guess one way is to cull out everyone below 10cM. Now I'm catching on. But this will eliminate most of my matches... It also covers a number of folks with whom I've already found common ancestry - the point here being that we might be cousins, but we probably don't share a large IBD segment... This is worse than finals week - I need to sleep. Jim - Sent from my iPhone - FaceTime! On Jan 7, 2012, at 12:29 AM, "Tim Janzen" <[email protected]> wrote: > Dear Jim, > The major issue with using unphased data for comparisons (in > Relative Finder, Family Finder, or other programs such as Jim McMillan's > that compare unphased data) is that a significant percentage of shorter > matching segments will be identical by state rather than identical by > descent. Relatively few segments over 12-15 cMs in length are likely to be > identical by state, but as we get down into analysis of segments that are in > the 7-8 cM range a significant percentage of such segments will be identical > by state. John Walden did a recent analysis that suggested that about 50% > of matching segments 8 cMs in length are identical by state. Here is a > small table of results that he recently compiled: > cM %IBD %IBS > 10 99 1 > 9 80 20 > 8 50 50 > 7 30 70 > 6 20 80 > 5 5 95 > > Doing comparisons using phased gets around the issue of identical by > state matches. Another simple method is what I have previously referred to > as "poor man's phasing". This is where you compare a parent and a child to > another person who has a matching segment with the parent. If the child > matches on the same segment with this other person that the parent matches > on, then you can be assured that the segment is highly likely to be IBD and > not IBS. However, if the child doesn't match with the other person and if > the matching segment is under 10 cMs or so then there is still a reasonable > possibility that the segment is IBS. This is one reason why it is a good > idea to have as many children as feasible do the 23andMe or FF test since > you will better be able to determine which segments are IBD and which > segments could be IBS. > Sincerely, > Tim Janzen > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Jim Bartlett > Sent: Friday, January 06, 2012 8:19 PM > To: [email protected]; [email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues ? > > Sam > > What is the "big issue" about unphased data? As always, my reference is to > genealogy. > > I can see where it would help, some, to know if a match was on my Dad or > Mom's side, but SURNAMES and geography can help there, too. And even if I > knew which parent my match was on, I still wouldn't know which grandparent. > > Jim > > > > ______________________________ > For answers to Frequently Asked Questions about mailing lists, please see: > http://dgmweb.net/MailingListFAQs.html > > > ------------------------------- > To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message
Dear Dwight, Keep in mind that one of your long term goals needs to be to figure out which portions of your DNA came from your mom and which portions came from your dad. The more of your DNA you can categorize in this way then they easier time you will have eliminating portions of your pedigree chart from consideration when you are evaluating your matches. Simply having your son tested would help you sort out a significant percentage of your matches that are IBD. Testing his mother as well would help you phase your data so that you could run comparisons of your phased data against the raw data files of any of your matches who are willing to share their data file with you. If you test your son you would expect that about 50% of your matches would also share the same segment with your son, assuming that all of your matches were IBD. Unfortunately, we know that all of your matches aren't IBD. Let me provide further data that compliments that statistics that John Walden generated. This summer I downloaded all of my mom's and my Family Finder matches. I then categorized all of my mom's matches as to whether or not I also shared the same segment with her matches. Here are my statistics: 1. For my mom's matches that were over 10 cMs, I was a match to 22 of her 47 matches, or about 47%. This is slightly under the expected percentage of 50%, but in general this helps confirm John's statistics that about 99% of matches over 10 cMs are IBD. 2. For my mom's matches that were between 9 and 10 cMs, I was a match to 7 of her 18 matches, or about 39%. This is somewhat under the expected percentage of 50%. 3. For my mom's matches that were between 8 and 9 cMs, I was a match to 9 of her 34 matches, or about 26%. This is significantly under the expected percentage of 50%. 4. For my mom's matches that were between 5 and 8 cMs, I was a match to 12 of her 21 matches, or about 57%. This is significantly higher the expected percentage of 50%. 5. For my mom's matches that were between 3.5 and 5 cMs, I was a match to 29 of her 73 matches, or about 40%. This is significantly lower the expected percentage of 50%. My data doesn't look quite as striking as John's does for the percentage of matches that are IBS that are less than 8 cMs, but it does indicate that in general a significant percentage of matches between 3.5 and 10 cMs will be IBS. When I have more time I will try to generate more data from both FF and 23andMe on this topic. Sincerely, Tim -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Dwight Holmes Sent: Friday, January 06, 2012 11:18 PM To: [email protected] Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues ? Tim, if I were to test my son and his mother, we would be able to phase our data. Would that then allow me to better evaluate my own matches as well as my sons? I've always basically skipped over discussions of phasing because, since my parents are both deceased, I just figured this was something that didn't apply to me. But perhaps it would be allow for partial understanding of my own atDNA by doing as I suggested above?? So any segments that my son has that came from me and pass the phasing test would therefore also be proven IBD segments for me as well. So I guess roughly 1/2 of my atDNA would be assessable by this method?