RootsWeb.com Mailing Lists
Total: 6/6
    1. [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA
    2. David Schroeder via
    3. This is a continuation of my observations comparing raw data files from 23andme and AncestryDNA. Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded the raw data files from each one. These two raw datas I will call 'Original'. Using SQL I loaded these files to a database. I also converted the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on both of them using matching RSIDs common to both when one vendor had a no-call, and the other vendor had made a call on the values. Several thousand no-calls were fixed this way reducing the error rate by about a third I extracted these database tables into new raw data files that I call 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 original and 2 fixed. I then used Gedmatch's utility, "Matching Segment Search" and extracted matches for all 4 individually using a Segment Length of 5 cM. I uploaded each of the matches into a database table so I could compare. All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA kits by these amounts: +-----------------+--------+--------+ | 23 Orig Matches | Vendor | Avg cM | +-----------------+--------+--------+ | 696 | A | 7.6 | | 329 | F | 7.3 | | 933 | M | 7.8 | +-----------------+--------+--------+ +---------------+----------------------+ | Total Matches | Original 23andme kit | +---------------+----------------------+ | 1958 | M080859 | +---------------+----------------------+ +------------------+--------+--------+ | 23 Fixed Matches | Vendor | Avg cM | +------------------+--------+--------+ | 779 | A | 7.6 | | 257 | F | 7.6 | | 715 | M | 8.1 | +------------------+--------+--------+ +---------------+-------------------+ | Total Matches | Fixed 23andme kit | +---------------+-------------------+ | 1751 | M306764 | +---------------+-------------------+ There are 207 fewer matches on fixed 23andme data kits ####################################################### +------------------+--------+--------+ | ANC Orig Matches | Vendor | Avg cM | +------------------+--------+--------+ | 946 | A | 7.3 | | 335 | F | 7.2 | | 316 | M | 8.8 | +------------------+--------+--------+ +---------------+------------------+ | Total Matches | Original ANC kit | +---------------+------------------+ | 1597 | A934219 | +---------------+------------------+ +-------------------+--------+--------+ | ANC Fixed Matches | Vendor | Avg cM | +-------------------+--------+--------+ | 799 | A | 7.6 | | 238 | F | 7.7 | | 223 | M | 9.8 | +-------------------+--------+--------+ +---------------+---------------+ | Total Matches | Fixed ANC kit | +---------------+---------------+ | 1260 | A146269 | +---------------+---------------+ There are 327 fewer matches on fixed ancestrydna kits. This was somewhat similar in magnitude in reductions for the 23andme fixed vs original kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs fixed ancestrydna My thinking about fewer matches on the fixed kits were due to false positives were being eliminated because many no-calls getting fixed, especially ones with opposite homozygous alleles. They would share basically the same kits- other than the ones that were eliminated in the fixed. But looking at each kit individually it is much more than that. I compared 23andme kits- fixed vs original and vice versa- for kits not present in one or the other. Amazing to me, it was not a matter of the fixed kit having matches, but there were entirely different matches in each kit: kits that are in the 23andme original but are not in ancestry original; +-------------------------+--------+--------+ | 23 Orig not in ANC Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 86 | A | 6.9 | | 85 | F | 6.8 | | 709 | M | 7.3 | +-------------------------+--------+--------+ Kits that are in ancestry original but are not in 23andme original; +---------------------------------+--------+--------+ | ANC original not in 23 original | Vendor | AVG cM | +---------------------------------+--------+--------+ | 337 | A | 6.6 | | 94 | F | 6.7 | | 105 | M | 7.0 | +---------------------------------+--------+--------+ kits that are in the ancestry original but are not in 23andme fixed; +--------------------------+--------+--------+ | ANC Orig not in 23 fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 229 | A | 6.6 | | 130 | F | 6.5 | | 139 | M | 6.9 | +--------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 original; +-------------------------+--------+--------+ | 23 Fixed Not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 original that have no matches in 23 fixed; +-------------------------+--------+--------+ | 23 Orig not in 23 fixed | Vendor | AVG cM | +-------------------------+--------+--------+ | 202 | A | 6.6 | | 139 | F | 6.5 | | 263 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 orig; +-------------------------+--------+--------+ | 23 fixed not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits that are in the Ancestry fixed but are not in Ancestry original; +---------------------------+--------+--------+ | ANC fixed not in ANC Orig | Vendor | AVG cM | +---------------------------+--------+--------+ | 48 | A | 7.8 | | 17 | F | 8.0 | | 11 | M | 8.7 | +---------------------------+--------+--------+ kits that are in the Ancestry original but are not in Ancestry fixed; +---------------------------+--------+--------+ | ANC Orig not in ANC Fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 200 | A | 6.6 | | 114 | F | 6.4 | | 106 | M | 6.6 | +---------------------------+--------+--------+ kits that are in the 23andme fixed but are not in ancestry fixed; +---------------------------+--------+--------+ | 23 fixed not in ANC fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 56 | A | 6.6 | | 35 | F | 7.0 | | 542 | M | 7.4 | +---------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme fixed; +---------------------------+--------+--------+ | ANC fixed not in 23 fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 71 | A | 6.9 | | 19 | F | 7.0 | | 58 | M | 7.5 | +---------------------------+--------+--------+ kits that are in the 23andme original but are not in ancestry fixed; +--------------------------+--------+--------+ | 23 Orig not in ANC fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 218 | A | 6.6 | | 160 | F | 6.5 | | 763 | M | 7.2 | +--------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme original; +--------------------------+--------+--------+ | ANC fixed not in 23 Orig | Vendor | AVG cM | +--------------------------+--------+--------+ | 315 | A | 6.7 | | 71 | F | 7.0 | | 59 | M | 7.4 | +--------------------------+--------+--------+ ################################################################### Maybe, too much information here, but I wanted to show each comparison. Disclosure: I have a pile-up region on chromosome 15 where I have hundreds of bogus matches. I did not factor that out. I invite anyone's comments about what may be going on. And any suggestions about further queries. David Schroeder

    01/06/2016 03:22:36
    1. Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA
    2. Roberta Estes via
    3. Very interesting. What was the original error rare vs the "fixed" error rate? Also, were the errors typically one location at a time or did you find runs of errors with several locations in a row? Roberta -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder via Sent: Wednesday, January 06, 2016 11:23 AM To: genealogy-dna@rootsweb.com Subject: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA This is a continuation of my observations comparing raw data files from 23andme and AncestryDNA. Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded the raw data files from each one. These two raw datas I will call 'Original'. Using SQL I loaded these files to a database. I also converted the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on both of them using matching RSIDs common to both when one vendor had a no-call, and the other vendor had made a call on the values. Several thousand no-calls were fixed this way reducing the error rate by about a third I extracted these database tables into new raw data files that I call 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 original and 2 fixed. I then used Gedmatch's utility, "Matching Segment Search" and extracted matches for all 4 individually using a Segment Length of 5 cM. I uploaded each of the matches into a database table so I could compare. All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA kits by these amounts: +-----------------+--------+--------+ | 23 Orig Matches | Vendor | Avg cM | +-----------------+--------+--------+ | 696 | A | 7.6 | | 329 | F | 7.3 | | 933 | M | 7.8 | +-----------------+--------+--------+ +---------------+----------------------+ | Total Matches | Original 23andme kit | +---------------+----------------------+ | 1958 | M080859 | +---------------+----------------------+ +------------------+--------+--------+ | 23 Fixed Matches | Vendor | Avg cM | +------------------+--------+--------+ | 779 | A | 7.6 | | 257 | F | 7.6 | | 715 | M | 8.1 | +------------------+--------+--------+ +---------------+-------------------+ | Total Matches | Fixed 23andme kit | +---------------+-------------------+ | 1751 | M306764 | +---------------+-------------------+ There are 207 fewer matches on fixed 23andme data kits ####################################################### +------------------+--------+--------+ | ANC Orig Matches | Vendor | Avg cM | +------------------+--------+--------+ | 946 | A | 7.3 | | 335 | F | 7.2 | | 316 | M | 8.8 | +------------------+--------+--------+ +---------------+------------------+ | Total Matches | Original ANC kit | +---------------+------------------+ | 1597 | A934219 | +---------------+------------------+ +-------------------+--------+--------+ | ANC Fixed Matches | Vendor | Avg cM | +-------------------+--------+--------+ | 799 | A | 7.6 | | 238 | F | 7.7 | | 223 | M | 9.8 | +-------------------+--------+--------+ +---------------+---------------+ | Total Matches | Fixed ANC kit | +---------------+---------------+ | 1260 | A146269 | +---------------+---------------+ There are 327 fewer matches on fixed ancestrydna kits. This was somewhat similar in magnitude in reductions for the 23andme fixed vs original kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs fixed ancestrydna My thinking about fewer matches on the fixed kits were due to false positives were being eliminated because many no-calls getting fixed, especially ones with opposite homozygous alleles. They would share basically the same kits- other than the ones that were eliminated in the fixed. But looking at each kit individually it is much more than that. I compared 23andme kits- fixed vs original and vice versa- for kits not present in one or the other. Amazing to me, it was not a matter of the fixed kit having matches, but there were entirely different matches in each kit: kits that are in the 23andme original but are not in ancestry original; +-------------------------+--------+--------+ | 23 Orig not in ANC Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 86 | A | 6.9 | | 85 | F | 6.8 | | 709 | M | 7.3 | +-------------------------+--------+--------+ Kits that are in ancestry original but are not in 23andme original; +---------------------------------+--------+--------+ | ANC original not in 23 original | Vendor | AVG cM | +---------------------------------+--------+--------+ | 337 | A | 6.6 | | 94 | F | 6.7 | | 105 | M | 7.0 | +---------------------------------+--------+--------+ kits that are in the ancestry original but are not in 23andme fixed; +--------------------------+--------+--------+ | ANC Orig not in 23 fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 229 | A | 6.6 | | 130 | F | 6.5 | | 139 | M | 6.9 | +--------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 original; +-------------------------+--------+--------+ | 23 Fixed Not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 original that have no matches in 23 fixed; +-------------------------+--------+--------+ | 23 Orig not in 23 fixed | Vendor | AVG cM | +-------------------------+--------+--------+ | 202 | A | 6.6 | | 139 | F | 6.5 | | 263 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 orig; +-------------------------+--------+--------+ | 23 fixed not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits that are in the Ancestry fixed but are not in Ancestry original; +---------------------------+--------+--------+ | ANC fixed not in ANC Orig | Vendor | AVG cM | +---------------------------+--------+--------+ | 48 | A | 7.8 | | 17 | F | 8.0 | | 11 | M | 8.7 | +---------------------------+--------+--------+ kits that are in the Ancestry original but are not in Ancestry fixed; +---------------------------+--------+--------+ | ANC Orig not in ANC Fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 200 | A | 6.6 | | 114 | F | 6.4 | | 106 | M | 6.6 | +---------------------------+--------+--------+ kits that are in the 23andme fixed but are not in ancestry fixed; +---------------------------+--------+--------+ | 23 fixed not in ANC fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 56 | A | 6.6 | | 35 | F | 7.0 | | 542 | M | 7.4 | +---------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme fixed; +---------------------------+--------+--------+ | ANC fixed not in 23 fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 71 | A | 6.9 | | 19 | F | 7.0 | | 58 | M | 7.5 | +---------------------------+--------+--------+ kits that are in the 23andme original but are not in ancestry fixed; +--------------------------+--------+--------+ | 23 Orig not in ANC fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 218 | A | 6.6 | | 160 | F | 6.5 | | 763 | M | 7.2 | +--------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme original; +--------------------------+--------+--------+ | ANC fixed not in 23 Orig | Vendor | AVG cM | +--------------------------+--------+--------+ | 315 | A | 6.7 | | 71 | F | 7.0 | | 59 | M | 7.4 | +--------------------------+--------+--------+ ################################################################### Maybe, too much information here, but I wanted to show each comparison. Disclosure: I have a pile-up region on chromosome 15 where I have hundreds of bogus matches. I did not factor that out. I invite anyone's comments about what may be going on. And any suggestions about further queries. David Schroeder ------------------------------- To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    01/06/2016 04:35:47
    1. Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA
    2. David Schroeder via
    3. Error Rates (No-Calls): Original Ancestry: 1.9444416327699 percent Fixed Ancestry: 0.99817658890539 percent Original 23andme: 2.1583132550279 percent Fixed 23andme: 1.482448214704 percent +------+--------------+ | CHR | ANC no-calls | +------+--------------+ | X | 16 | | 1 | 1093 | | 2 | 1009 | | 3 | 835 | | 4 | 711 | | 5 | 753 | | 6 | 866 | | 7 | 742 | | 8 | 641 | | 9 | 658 | | 10 | 723 | | 11 | 680 | | 12 | 648 | | 13 | 540 | | 14 | 439 | | 15 | 404 | | 16 | 436 | | 17 | 428 | | 18 | 390 | | 19 | 424 | | 20 | 351 | | 21 | 185 | | 22 | 248 | +------+--------------+ +------+------------------+ | CHR | 23andme no-calls | +------+------------------+ | X | 996 | | Y | 1034 | | MT | 60 | | 1 | 1557 | | 2 | 1549 | | 3 | 1269 | | 4 | 1116 | | 5 | 1139 | | 6 | 1486 | | 7 | 1105 | | 8 | 1044 | | 9 | 865 | | 10 | 1012 | | 11 | 1026 | | 12 | 1011 | | 13 | 778 | | 14 | 673 | | 15 | 583 | | 16 | 671 | | 17 | 630 | | 18 | 543 | | 19 | 585 | | 20 | 474 | | 21 | 257 | | 22 | 430 | +------+------------------+ I did not see any errors in a row. There were several thousand no-calls on the same RSIDs for both Ancestry and 23andme. Those are the ones that mainly contributed to the error rates after the fix. David Schroeder -----Original Message----- From: Roberta Estes [mailto:robertajestes@att.net] Sent: Wednesday, January 6, 2016 10:36 AM To: 'David Schroeder'; genealogy-dna@rootsweb.com Subject: RE: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA Very interesting. What was the original error rare vs the "fixed" error rate? Also, were the errors typically one location at a time or did you find runs of errors with several locations in a row? Roberta -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder via Sent: Wednesday, January 06, 2016 11:23 AM To: genealogy-dna@rootsweb.com Subject: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA This is a continuation of my observations comparing raw data files from 23andme and AncestryDNA. Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded the raw data files from each one. These two raw datas I will call 'Original'. Using SQL I loaded these files to a database. I also converted the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on both of them using matching RSIDs common to both when one vendor had a no-call, and the other vendor had made a call on the values. Several thousand no-calls were fixed this way reducing the error rate by about a third I extracted these database tables into new raw data files that I call 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 original and 2 fixed. I then used Gedmatch's utility, "Matching Segment Search" and extracted matches for all 4 individually using a Segment Length of 5 cM. I uploaded each of the matches into a database table so I could compare. All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA kits by these amounts: +-----------------+--------+--------+ | 23 Orig Matches | Vendor | Avg cM | +-----------------+--------+--------+ | 696 | A | 7.6 | | 329 | F | 7.3 | | 933 | M | 7.8 | +-----------------+--------+--------+ +---------------+----------------------+ | Total Matches | Original 23andme kit | +---------------+----------------------+ | 1958 | M080859 | +---------------+----------------------+ +------------------+--------+--------+ | 23 Fixed Matches | Vendor | Avg cM | +------------------+--------+--------+ | 779 | A | 7.6 | | 257 | F | 7.6 | | 715 | M | 8.1 | +------------------+--------+--------+ +---------------+-------------------+ | Total Matches | Fixed 23andme kit | +---------------+-------------------+ | 1751 | M306764 | +---------------+-------------------+ There are 207 fewer matches on fixed 23andme data kits ####################################################### +------------------+--------+--------+ | ANC Orig Matches | Vendor | Avg cM | +------------------+--------+--------+ | 946 | A | 7.3 | | 335 | F | 7.2 | | 316 | M | 8.8 | +------------------+--------+--------+ +---------------+------------------+ | Total Matches | Original ANC kit | +---------------+------------------+ | 1597 | A934219 | +---------------+------------------+ +-------------------+--------+--------+ | ANC Fixed Matches | Vendor | Avg cM | +-------------------+--------+--------+ | 799 | A | 7.6 | | 238 | F | 7.7 | | 223 | M | 9.8 | +-------------------+--------+--------+ +---------------+---------------+ | Total Matches | Fixed ANC kit | +---------------+---------------+ | 1260 | A146269 | +---------------+---------------+ There are 327 fewer matches on fixed ancestrydna kits. This was somewhat similar in magnitude in reductions for the 23andme fixed vs original kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs fixed ancestrydna My thinking about fewer matches on the fixed kits were due to false positives were being eliminated because many no-calls getting fixed, especially ones with opposite homozygous alleles. They would share basically the same kits- other than the ones that were eliminated in the fixed. But looking at each kit individually it is much more than that. I compared 23andme kits- fixed vs original and vice versa- for kits not present in one or the other. Amazing to me, it was not a matter of the fixed kit having matches, but there were entirely different matches in each kit: kits that are in the 23andme original but are not in ancestry original; +-------------------------+--------+--------+ | 23 Orig not in ANC Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 86 | A | 6.9 | | 85 | F | 6.8 | | 709 | M | 7.3 | +-------------------------+--------+--------+ Kits that are in ancestry original but are not in 23andme original; +---------------------------------+--------+--------+ | ANC original not in 23 original | Vendor | AVG cM | +---------------------------------+--------+--------+ | 337 | A | 6.6 | | 94 | F | 6.7 | | 105 | M | 7.0 | +---------------------------------+--------+--------+ kits that are in the ancestry original but are not in 23andme fixed; +--------------------------+--------+--------+ | ANC Orig not in 23 fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 229 | A | 6.6 | | 130 | F | 6.5 | | 139 | M | 6.9 | +--------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 original; +-------------------------+--------+--------+ | 23 Fixed Not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 original that have no matches in 23 fixed; +-------------------------+--------+--------+ | 23 Orig not in 23 fixed | Vendor | AVG cM | +-------------------------+--------+--------+ | 202 | A | 6.6 | | 139 | F | 6.5 | | 263 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 orig; +-------------------------+--------+--------+ | 23 fixed not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits that are in the Ancestry fixed but are not in Ancestry original; +---------------------------+--------+--------+ | ANC fixed not in ANC Orig | Vendor | AVG cM | +---------------------------+--------+--------+ | 48 | A | 7.8 | | 17 | F | 8.0 | | 11 | M | 8.7 | +---------------------------+--------+--------+ kits that are in the Ancestry original but are not in Ancestry fixed; +---------------------------+--------+--------+ | ANC Orig not in ANC Fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 200 | A | 6.6 | | 114 | F | 6.4 | | 106 | M | 6.6 | +---------------------------+--------+--------+ kits that are in the 23andme fixed but are not in ancestry fixed; +---------------------------+--------+--------+ | 23 fixed not in ANC fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 56 | A | 6.6 | | 35 | F | 7.0 | | 542 | M | 7.4 | +---------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme fixed; +---------------------------+--------+--------+ | ANC fixed not in 23 fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 71 | A | 6.9 | | 19 | F | 7.0 | | 58 | M | 7.5 | +---------------------------+--------+--------+ kits that are in the 23andme original but are not in ancestry fixed; +--------------------------+--------+--------+ | 23 Orig not in ANC fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 218 | A | 6.6 | | 160 | F | 6.5 | | 763 | M | 7.2 | +--------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme original; +--------------------------+--------+--------+ | ANC fixed not in 23 Orig | Vendor | AVG cM | +--------------------------+--------+--------+ | 315 | A | 6.7 | | 71 | F | 7.0 | | 59 | M | 7.4 | +--------------------------+--------+--------+ ################################################################### Maybe, too much information here, but I wanted to show each comparison. Disclosure: I have a pile-up region on chromosome 15 where I have hundreds of bogus matches. I did not factor that out. I invite anyone's comments about what may be going on. And any suggestions about further queries. David Schroeder ------------------------------- To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    01/06/2016 04:02:45
    1. Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA
    2. David, A summary on the last comparisons would be fine. I've got lost in the many tables which despite your wonderful effort are still hard to read in plain email format. Great analysis! Andreas > On Jan 6, 2016, at 23:22, David Schroeder via <genealogy-dna@rootsweb.com> wrote: > > This is a continuation of my observations comparing raw data files from > 23andme and AncestryDNA. > > Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded > the raw data files from each one. These two raw datas I will call > 'Original'. Using SQL I loaded these files to a database. I also converted > the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on > both of them using matching RSIDs common to both when one vendor had a > no-call, and the other vendor had made a call on the values. Several > thousand no-calls were fixed this way reducing the error rate by about a > third I extracted these database tables into new raw data files that I call > 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 > original and 2 fixed. I then used Gedmatch's utility, "Matching Segment > Search" and extracted matches for all 4 individually using a Segment Length > of 5 cM. I uploaded each of the matches into a database table so I could > compare. > > > All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA > kits by these amounts: > > +-----------------+--------+--------+ > | 23 Orig Matches | Vendor | Avg cM | > +-----------------+--------+--------+ > | 696 | A | 7.6 | > | 329 | F | 7.3 | > | 933 | M | 7.8 | > +-----------------+--------+--------+ > +---------------+----------------------+ > | Total Matches | Original 23andme kit | > +---------------+----------------------+ > | 1958 | M080859 | > +---------------+----------------------+ > > +------------------+--------+--------+ > | 23 Fixed Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 779 | A | 7.6 | > | 257 | F | 7.6 | > | 715 | M | 8.1 | > +------------------+--------+--------+ > +---------------+-------------------+ > | Total Matches | Fixed 23andme kit | > +---------------+-------------------+ > | 1751 | M306764 | > +---------------+-------------------+ > > There are 207 fewer matches on fixed 23andme data kits > ####################################################### > +------------------+--------+--------+ > | ANC Orig Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 946 | A | 7.3 | > | 335 | F | 7.2 | > | 316 | M | 8.8 | > +------------------+--------+--------+ > +---------------+------------------+ > | Total Matches | Original ANC kit | > +---------------+------------------+ > | 1597 | A934219 | > +---------------+------------------+ > > +-------------------+--------+--------+ > | ANC Fixed Matches | Vendor | Avg cM | > +-------------------+--------+--------+ > | 799 | A | 7.6 | > | 238 | F | 7.7 | > | 223 | M | 9.8 | > +-------------------+--------+--------+ > +---------------+---------------+ > | Total Matches | Fixed ANC kit | > +---------------+---------------+ > | 1260 | A146269 | > +---------------+---------------+ > > There are 327 fewer matches on fixed ancestrydna kits. This was somewhat > similar in magnitude in reductions for the 23andme fixed vs original > kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs > fixed ancestrydna > > > > My thinking about fewer matches on the fixed kits were due to false > positives were being eliminated because many no-calls getting fixed, > especially ones with opposite homozygous alleles. They would share basically > the same kits- other than the ones that were eliminated in the fixed. But > looking at each kit individually it is much more than that. > > I compared 23andme kits- fixed vs original and vice versa- for kits not > present in one or the other. Amazing to me, it was not a matter of the > fixed kit having matches, but there were entirely different matches in each > kit: > > kits that are in the 23andme original but are not in ancestry original; > +-------------------------+--------+--------+ > | 23 Orig not in ANC Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 86 | A | 6.9 | > | 85 | F | 6.8 | > | 709 | M | 7.3 | > +-------------------------+--------+--------+ > > Kits that are in ancestry original but are not in 23andme original; > +---------------------------------+--------+--------+ > | ANC original not in 23 original | Vendor | AVG cM | > +---------------------------------+--------+--------+ > | 337 | A | 6.6 | > | 94 | F | 6.7 | > | 105 | M | 7.0 | > +---------------------------------+--------+--------+ > > kits that are in the ancestry original but are not in 23andme fixed; > +--------------------------+--------+--------+ > | ANC Orig not in 23 fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 229 | A | 6.6 | > | 130 | F | 6.5 | > | 139 | M | 6.9 | > +--------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 original; > +-------------------------+--------+--------+ > | 23 Fixed Not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 original that have no matches in 23 fixed; > +-------------------------+--------+--------+ > | 23 Orig not in 23 fixed | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 202 | A | 6.6 | > | 139 | F | 6.5 | > | 263 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 orig; > +-------------------------+--------+--------+ > | 23 fixed not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits that are in the Ancestry fixed but are not in Ancestry original; > +---------------------------+--------+--------+ > | ANC fixed not in ANC Orig | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 48 | A | 7.8 | > | 17 | F | 8.0 | > | 11 | M | 8.7 | > +---------------------------+--------+--------+ > > kits that are in the Ancestry original but are not in Ancestry fixed; > +---------------------------+--------+--------+ > | ANC Orig not in ANC Fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 200 | A | 6.6 | > | 114 | F | 6.4 | > | 106 | M | 6.6 | > +---------------------------+--------+--------+ > > kits that are in the 23andme fixed but are not in ancestry fixed; > +---------------------------+--------+--------+ > | 23 fixed not in ANC fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 56 | A | 6.6 | > | 35 | F | 7.0 | > | 542 | M | 7.4 | > +---------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme fixed; > +---------------------------+--------+--------+ > | ANC fixed not in 23 fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 71 | A | 6.9 | > | 19 | F | 7.0 | > | 58 | M | 7.5 | > +---------------------------+--------+--------+ > > kits that are in the 23andme original but are not in ancestry fixed; > +--------------------------+--------+--------+ > | 23 Orig not in ANC fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 218 | A | 6.6 | > | 160 | F | 6.5 | > | 763 | M | 7.2 | > +--------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme original; > +--------------------------+--------+--------+ > | ANC fixed not in 23 Orig | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 315 | A | 6.7 | > | 71 | F | 7.0 | > | 59 | M | 7.4 | > +--------------------------+--------+--------+ > > ################################################################### > Maybe, too much information here, but I wanted to show each comparison. > > Disclosure: I have a pile-up region on chromosome 15 where I have hundreds > of bogus matches. I did not factor that out. > > I invite anyone's comments about what may be going on. And any suggestions > about further queries. > > David Schroeder > > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    01/06/2016 05:08:48
    1. Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA
    2. David Schroeder via
    3. For comparison for the matches I had on Chromosome = '15' and (POS > 23976155 AND POS < 25855576) My pileup region +-----------+---------+ | Fixed ANC | AVG(CM) | +-----------+---------+ | 119 | 10.2 | +-----------+---------+ +----------+---------+ | Orig ANC | AVG(CM) | +----------+---------+ | 150 | 10.0 | +----------+---------+ +---------------+---------+ | Fixed 23andme | AVG(CM) | +---------------+---------+ | 442 | 8.6 | +---------------+---------+ +--------------+---------+ | Orig 23andme | AVG(CM) | +--------------+---------+ | 556 | 8.3 | +--------------+---------+ It is a mystery to me why 23andme generated more matches in my pileup region. David Schroeder -----Original Message----- From: David Schroeder [mailto:dschroed991@sbcglobal.net] Sent: Wednesday, January 6, 2016 2:18 PM To: 'ahnen@awest.de'; 'genealogy-dna@rootsweb.com' Subject: RE: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA Sorry about the sea of information. There were so many comparisons, and each one had surprising results. I think I can say at least 25% of my matches coming from 23andme were from a Pile-up on Chromosome 15: POS > 23976155 and POS < 25855576). I looked at the two kits, Ancestry checks for 460 RSIDs in this range of positions. 23andme checks for 602. The pileups for Ancestry are much less than 23andme. Ancestry only had 12 no-calls in this region, 23andme only had 17. I don't know why there were hundreds more matches in this range for 23andme compared to Ancestry. David -----Original Message----- From: ahnen@awest.de [mailto:ahnen@awest.de] Sent: Wednesday, January 6, 2016 11:09 AM To: David Schroeder; genealogy-dna@rootsweb.com Subject: Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA David, A summary on the last comparisons would be fine. I've got lost in the many tables which despite your wonderful effort are still hard to read in plain email format. Great analysis! Andreas

    01/06/2016 07:37:01
    1. Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA
    2. Jim Bartlett via
    3. David, This is a very interesting study on the differences in raw data. Thank you for sharing. However, the context is that the matching algorithm used was GEDmatch, which is somewhat different than the algorithms used at 23andMe and GEDmatch. So the analysis is based on GEDmatch alone. [I'm copying GEDmatch on this, too] It is surprising to me how much variance there is. In my experience, it appears the 23andMe, FTDNA and GEDmatch Matches and segments align pretty well. Two thoughts: 1. Try the comparisons again using the GEDmatch standard 700 SNPs and 7cM. I used that standard for several years. Last year (after I had formed TGs covering about 85% of my 45 chromosomes), I lowered the threshold to 500/5 and found that about 95% of the 5-7cM segments did not Triangulate and were IBS. I'd also be curious about the comparisons using a 1,000/10 threshold. 2. I did a quick comparison of GEDmatch Matches and segments on a trio of files at FTDNA and also GEDmatch. FTDNA had many false negatives, which showed up as positive Matches at GEDmatch. This highlighted the conservative FTDNA algorithm. We have lots of anecdotal data that AncestryDNA's algorithm also has many, sometimes significant, false negatives (real matches that Ancestry does not report). I was going to suggest a comparison of your GEDmatch results with the vendor results, but now realize that notion is folly (we don't know the full set of Matches at 23andMe, and AncestryDNA has most Matches with code names making both of these comparisons impossible). Only FTDNA gives us the full list. Anyway, I'd like to see how much difference the threshold makes - I think there are many spurious Matches at 5-7cM. Jim - www.segmentology.org > On Jan 6, 2016, at 11:22 AM, David Schroeder via <genealogy-dna@rootsweb.com> wrote: > > This is a continuation of my observations comparing raw data files from > 23andme and AncestryDNA. > > Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded > the raw data files from each one. These two raw datas I will call > 'Original'. Using SQL I loaded these files to a database. I also converted > the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on > both of them using matching RSIDs common to both when one vendor had a > no-call, and the other vendor had made a call on the values. Several > thousand no-calls were fixed this way reducing the error rate by about a > third I extracted these database tables into new raw data files that I call > 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 > original and 2 fixed. I then used Gedmatch's utility, "Matching Segment > Search" and extracted matches for all 4 individually using a Segment Length > of 5 cM. I uploaded each of the matches into a database table so I could > compare. > > > All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA > kits by these amounts: > > +-----------------+--------+--------+ > | 23 Orig Matches | Vendor | Avg cM | > +-----------------+--------+--------+ > | 696 | A | 7.6 | > | 329 | F | 7.3 | > | 933 | M | 7.8 | > +-----------------+--------+--------+ > +---------------+----------------------+ > | Total Matches | Original 23andme kit | > +---------------+----------------------+ > | 1958 | M080859 | > +---------------+----------------------+ > > +------------------+--------+--------+ > | 23 Fixed Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 779 | A | 7.6 | > | 257 | F | 7.6 | > | 715 | M | 8.1 | > +------------------+--------+--------+ > +---------------+-------------------+ > | Total Matches | Fixed 23andme kit | > +---------------+-------------------+ > | 1751 | M306764 | > +---------------+-------------------+ > > There are 207 fewer matches on fixed 23andme data kits > ####################################################### > +------------------+--------+--------+ > | ANC Orig Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 946 | A | 7.3 | > | 335 | F | 7.2 | > | 316 | M | 8.8 | > +------------------+--------+--------+ > +---------------+------------------+ > | Total Matches | Original ANC kit | > +---------------+------------------+ > | 1597 | A934219 | > +---------------+------------------+ > > +-------------------+--------+--------+ > | ANC Fixed Matches | Vendor | Avg cM | > +-------------------+--------+--------+ > | 799 | A | 7.6 | > | 238 | F | 7.7 | > | 223 | M | 9.8 | > +-------------------+--------+--------+ > +---------------+---------------+ > | Total Matches | Fixed ANC kit | > +---------------+---------------+ > | 1260 | A146269 | > +---------------+---------------+ > > There are 327 fewer matches on fixed ancestrydna kits. This was somewhat > similar in magnitude in reductions for the 23andme fixed vs original > kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs > fixed ancestrydna > > > > My thinking about fewer matches on the fixed kits were due to false > positives were being eliminated because many no-calls getting fixed, > especially ones with opposite homozygous alleles. They would share basically > the same kits- other than the ones that were eliminated in the fixed. But > looking at each kit individually it is much more than that. > > I compared 23andme kits- fixed vs original and vice versa- for kits not > present in one or the other. Amazing to me, it was not a matter of the > fixed kit having matches, but there were entirely different matches in each > kit: > > kits that are in the 23andme original but are not in ancestry original; > +-------------------------+--------+--------+ > | 23 Orig not in ANC Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 86 | A | 6.9 | > | 85 | F | 6.8 | > | 709 | M | 7.3 | > +-------------------------+--------+--------+ > > Kits that are in ancestry original but are not in 23andme original; > +---------------------------------+--------+--------+ > | ANC original not in 23 original | Vendor | AVG cM | > +---------------------------------+--------+--------+ > | 337 | A | 6.6 | > | 94 | F | 6.7 | > | 105 | M | 7.0 | > +---------------------------------+--------+--------+ > > kits that are in the ancestry original but are not in 23andme fixed; > +--------------------------+--------+--------+ > | ANC Orig not in 23 fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 229 | A | 6.6 | > | 130 | F | 6.5 | > | 139 | M | 6.9 | > +--------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 original; > +-------------------------+--------+--------+ > | 23 Fixed Not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 original that have no matches in 23 fixed; > +-------------------------+--------+--------+ > | 23 Orig not in 23 fixed | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 202 | A | 6.6 | > | 139 | F | 6.5 | > | 263 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 orig; > +-------------------------+--------+--------+ > | 23 fixed not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits that are in the Ancestry fixed but are not in Ancestry original; > +---------------------------+--------+--------+ > | ANC fixed not in ANC Orig | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 48 | A | 7.8 | > | 17 | F | 8.0 | > | 11 | M | 8.7 | > +---------------------------+--------+--------+ > > kits that are in the Ancestry original but are not in Ancestry fixed; > +---------------------------+--------+--------+ > | ANC Orig not in ANC Fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 200 | A | 6.6 | > | 114 | F | 6.4 | > | 106 | M | 6.6 | > +---------------------------+--------+--------+ > > kits that are in the 23andme fixed but are not in ancestry fixed; > +---------------------------+--------+--------+ > | 23 fixed not in ANC fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 56 | A | 6.6 | > | 35 | F | 7.0 | > | 542 | M | 7.4 | > +---------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme fixed; > +---------------------------+--------+--------+ > | ANC fixed not in 23 fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 71 | A | 6.9 | > | 19 | F | 7.0 | > | 58 | M | 7.5 | > +---------------------------+--------+--------+ > > kits that are in the 23andme original but are not in ancestry fixed; > +--------------------------+--------+--------+ > | 23 Orig not in ANC fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 218 | A | 6.6 | > | 160 | F | 6.5 | > | 763 | M | 7.2 | > +--------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme original; > +--------------------------+--------+--------+ > | ANC fixed not in 23 Orig | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 315 | A | 6.7 | > | 71 | F | 7.0 | > | 59 | M | 7.4 | > +--------------------------+--------+--------+ > > ################################################################### > Maybe, too much information here, but I wanted to show each comparison. > > Disclosure: I have a pile-up region on chromosome 15 where I have hundreds > of bogus matches. I did not factor that out. > > I invite anyone's comments about what may be going on. And any suggestions > about further queries. > > David Schroeder >

    01/07/2016 01:10:31