Does anyone know how smart the matching algorithms are for Family Finder? My understanding is that each SNP in the 700,000+ has been selected because it has a useful allelle frequency in the general population. I don't know what the ranges are but presumably they run from 50/50 to 90/10 or beyond? If there are runs of consecutive SNPs that have a 90%/10% frequency in the European population, then a higher number of consecutive matching SNPs would be required to ensure an IBD match. Do the high numbers of IBS matches for 10-11cM occur in particular locations eg Chromosome 6? Duncan Date: Mon, 9 Jan 2012 00:28:27 -0800 From: "Tim Janzen" <[email protected]> Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues ? To: <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii" Dear All, I decided to download the latest Family Finder data for my wife's parents, my wife, my parents, and me yesterday. I then analyzed the data so see how many people appear in the FF match lists for my wife and me who don't appear in the FF match lists our parents. I then used that data to create my own statistics regarding the percentage of matches at various segment lengths in cMs to see how my data compares to the statistics that John Walden generated. Here are my results: cMs %IBD %IBS >11 100 (52/52) 0 10-11 80 (12/15) 20 9-10 93 (25/27) 7 8-9 81 (34/42) 19 7-8 46 (11/24) 54 6-7 67 (4/6) 33 5-6 40 (6/15) 60 4-5 20 (10/51) 80 3.5-4 17 (11/66) 83 Below are John Walden's results from his analysis that I posted in another message several days ago: cM %IBD %IBS 10 99 1 9 80 20 8 50 50 7 30 70 6 20 80 5 5 95 My results would suggest that a higher percentage of matches under 9 cMs in length are IBD than John's analysis would suggest. In any case, it would appear that a significant percentage of matches in the 6-9 cM range are IBS. If any of the rest of you have two parent/one child trio data in Family Finder it would be interesting to see if your results are similar to mine. Sincerely, Tim Janzen
Dear Duncan, I don't know how "smart" the matching algorism is for FF, but based on the results of my FF data, it seems to do a fairly good job of finding all or almost all of the appropriate matches assuming that the matching segment is over 9 cMs or so. You can find the allele frequencies for individual SNPs by going to http://www.ncbi.nlm.nih.gov/snp and entering the SNP of interest. If you have quite a few people on your 23andMe account then you can get a rough idea of the allele frequencies for individual SNPs by going to the "browse raw data" section, entering the SNP of interest and then reviewing the genotypes for the people on your account. The 3 matches for my wife and I that FF reported as being IBS that were between 10 and 11 cMs were on chromosome 20 (1900 SNPs), chromosome 7 (2400 SNPs), and chromosome 15 (2261 SNPs). Sincerely, Tim Janzen -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of r0berts0n Sent: Tuesday, January 10, 2012 2:35 AM To: [email protected] Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues Does anyone know how smart the matching algorithms are for Family Finder? My understanding is that each SNP in the 700,000+ has been selected because it has a useful allelle frequency in the general population. I don't know what the ranges are but presumably they run from 50/50 to 90/10 or beyond? If there are runs of consecutive SNPs that have a 90%/10% frequency in the European population, then a higher number of consecutive matching SNPs would be required to ensure an IBD match. Do the high numbers of IBS matches for 10-11cM occur in particular locations eg Chromosome 6? Duncan
The allele frequencies would certainly have a bearing on how easy it is to obtain a run of matching SNPs. However, there's no easy consolidated way to get that information. I suspect that it's not a major factor, since it's quite rare to achieve a long run of half-identical SNPs when I use David Pike's utility on unrelated people, and opposite homozygotes crop up quite frequently. You can push the required number of SNPs down to one to get a feel for this (and allow no mismatches) -- but just do it for a fragment of the download because the output will be long. http://www.math.mun.ca/~dapike/FF23utils/pair-comp.php The exception for long runs of SNPs is the HLA region on chromosome 6, which has a lot of know variants, but the recombination rate is low, so the runs tend to be in the vicinity of 1-3 cM. 23andMe has a list of about a dozen regions where they require more SNPs (up to 1200). The FTDNA FAQ states "The Family Finder software clusters the SNPs into sets that are about 50 to 100 SNPs long. The Bioinformatics team has predefined the SNP sets based on contributing SNPs' reliability, variability, average centiMorgans (cM)<http://www.familytreedna.com/faq/answers.aspx?id=21#663>, density, and other statistical considerations." (It also states there is a hard stop at the centromere, due to the low number of SNPs, but I think that may be obsolete.) http://www.familytreedna.com/faq/answers.aspx?id=17#798 I'm not altogether convinced that FTDNA's SNP sets improve the matching algorithm, but that's proprietary information. Ann On Tue, Jan 10, 2012 at 2:35 AM, r0berts0n <[email protected]> wrote: > Does anyone know how smart the matching algorithms are for Family Finder? > My understanding is that each SNP in the 700,000+ has been selected because > it has a useful allelle frequency in the general population. I don't know > what the ranges are but presumably they run from 50/50 to 90/10 or beyond? > If there are runs of consecutive SNPs that have a 90%/10% frequency in the > European population, then a higher number of consecutive matching SNPs > would > be required to ensure an IBD match. Do the high numbers of IBS matches for > 10-11cM occur in particular locations eg Chromosome 6? > > Duncan >
Dear Ann, Is this list of one dozen regions publicly available? If so, can you share with us a list of these regions? Sincerely, Tim Janzen -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Ann Turner Sent: Wednesday, January 11, 2012 11:59 AM To: [email protected] Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with few clues The exception for long runs of SNPs is the HLA region on chromosome 6, which has a lot of know variants, but the recombination rate is low, so the runs tend to be in the vicinity of 1-3 cM. 23andMe has a list of about a dozen regions where they require more SNPs (up to 1200). Ann
No, it's proprietary information. However, I was given permission to provide yes/no answers about whether a particular segment required more than the typical number of SNPs. If you'll write to me off-list, I can look up the segments in question. Several of them are near centromeres or telomeres. Ann Turner On Thu, Jan 12, 2012 at 8:34 PM, Tim Janzen <[email protected]> wrote: > Dear Ann, > Is this list of one dozen regions publicly available? If so, can > you share with us a list of these regions? > Sincerely, > Tim Janzen > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Ann Turner > Sent: Wednesday, January 11, 2012 11:59 AM > To: [email protected] > Subject: Re: [AUTOSOMAL-DNA] SUBJECT: How do you work a 5 way FF match with > few clues > > The exception for long runs of SNPs is the HLA region on chromosome 6, > which has a lot of know variants, but the recombination rate is low, so the > runs tend to be in the vicinity of 1-3 cM. 23andMe has a list of about a > dozen regions where they require more SNPs (up to 1200). > > Ann >