I was able to 'fix' the no-calls for matching RSIDs on both Ancestry and 23andme when one, or the other, was not a no-call. I fixed 6,632 on 23andme and 6,708 on Ancestry. Interestingly enough, there were 3,833 that were left as no-calls on both 23andme and AncestryDNA for the same RSIDs. I am wondering if these are the result of particularly difficult locations to test, or perhaps the SNP is rare in my genome? The tests were over two years apart. I uploaded both fixed raw data files to gedmatch to see how it may affect my 'one-to-many' matches. (Will have to wait on the processing). I ran the Gedmatch File Diagnostic Utility, and the fixed files had significantly reduced my error rates. It seems that most of my errors are in the X, Y or MT Chromosomes. David ------------------------------ Message: 4 Date: Sun, 13 Dec 2015 03:31:45 -0800 From: Ann Turner <dnacousins@gmail.com> Subject: Re: [DNA] My Raw Data Files - Comparison 23andme vs AncestryDNA To: DNA Genealogy Mailing List <genealogy-dna@rootsweb.com> Message-ID: <CAA-Ub_COJUEcMV4v3aXj4hbEaj6cbFf01AT9yDSBMJwDoyTnsA@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 I've always mentally thought about the "i" SNPs as "internal" catalog numbers, but I'm not positive if I made that up or actually noticed someone from 23andMe used that word :) As you probably noticed, AncestryDNA doesn't always present alleles in alphabetical order. You will find instances of TC and CT, for example. Illumina's base-calling software has something called a "top strand" and a "bottom strand" (not the same thing as forward/reverse or plus/minus). 23andMe does some post-processing to put alleles in alphabetical order. Anyway, did you also look for TA? SNPs where the alternative alleles are also complementary base pairs in the double helix ( A <-> T and C <-> G) are tricky to handle. 23andMe may have developed custom probes to identify some of those. I've also noticed that AncestryDNA and FTDNA do not report any indels (the I and D alleles you asked about). Tim, this may not be worth the effort to analyze, but I'm curious to know if the "i" variants with rs numbers at FTDNA may be cases where 23andMe put some additional probes on the chip for a particular locus. If you have a list handy, I could explore that a bit. Ann Turner Ann Turner On Sat, Dec 12, 2015 at 11:41 PM, Tim Janzen via <genealogy-dna@rootsweb.com > wrote: > Dear David, > DD means a deletion and II means an insertion. The "i" SNPs in the > 23andMe files are those that don't have rs numbers assigned to them by > 23andMe. It is possible that "i" stands for Illumina, but I am not certain > about that. It is also possible that it stands for "inserted", possibly > because 23andMe inserted these SNPs onto the SNP chip because they were of > special interest to 23andMe. Someone at 23andMe would know the answer to > this question. > It is interesting that AncestryDNA files don't have SNPs with the > allele values AT. I don't have a definite answer for that. I checked my > mom's file for the SNPs that have the allele values AT in 23andMe and found > a total of 322 of these SNPs. I then checked for these SNPs in my mom's > AncestryDNA file and I couldn't find any of those SNPs in my mom's > AncestryDNA file. My suspicion is that Ancestry.com has dropped all SNPs > from their dataset with the values AT because they think that the results > may be erroneous. > Sincerely, > Tim Janzen > > -----Original Message----- > From: genealogy-dna-bounces@rootsweb.com > [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder > via > Sent: Saturday, December 12, 2015 9:33 PM > To: genealogy-dna@rootsweb.com > Subject: [DNA] My Raw Data Files - Comparison 23andme vs AncestryDNA > > I have tested at both 23andme (V3) and AncestryDNA. I have written a > program > to add the raw data file information into a MySQL database, creating > separate tables for my 23andme results and my AncestryDNA. > > I am trying to understand some things. > > I can understand all the A, C, G, T lettering. The single letters represent > SNPs on my Y and X chromosomes. I also understand that '--' is a no call. > What are 'DD' and 'II'? > > > I also found that AncestryDNA had no 'AT' SNPs for me, but 23andme had 611: > > Can anyone explain why I have no 'AT' SNP pairs in my AncestryDNA raw data > file? I verified this by browsing my Ancestry Raw data file. I had every > other SNP pair represented. > > The final question is about RSIDs. What are the ones that begin with 'i' in > my 23andme raw data file? I have 10,709 RSIDs that begin with 'i-----'. > > David > > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message >
That's very interesting and I thought about such myself (especially with now having tested at all 3 companies over almost 3 years). How did you "clean up" your no-calls, did you manually go through it? Or is that part of your program you wrote? Great post, David! Andreas > On 15 Dec 2015, at 11:47, David Schroeder via <genealogy-dna@rootsweb.com> wrote: > > I was able to 'fix' the no-calls for matching RSIDs on both Ancestry and > 23andme when one, or the other, was not a no-call. I fixed 6,632 on 23andme > and 6,708 on Ancestry. > > Interestingly enough, there were 3,833 that were left as no-calls on both > 23andme and AncestryDNA for the same RSIDs. I am wondering if these are the > result of particularly difficult locations to test, or perhaps the SNP is > rare in my genome? The tests were over two years apart. > > I uploaded both fixed raw data files to gedmatch to see how it may affect my > 'one-to-many' matches. (Will have to wait on the processing). I ran the > Gedmatch File Diagnostic Utility, and the fixed files had significantly > reduced my error rates. It seems that most of my errors are in the X, Y or > MT Chromosomes. > > David > > ------------------------------ > > Message: 4 > Date: Sun, 13 Dec 2015 03:31:45 -0800 > From: Ann Turner <dnacousins@gmail.com> > Subject: Re: [DNA] My Raw Data Files - Comparison 23andme vs > AncestryDNA > To: DNA Genealogy Mailing List <genealogy-dna@rootsweb.com> > Message-ID: > <CAA-Ub_COJUEcMV4v3aXj4hbEaj6cbFf01AT9yDSBMJwDoyTnsA@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > I've always mentally thought about the "i" SNPs as "internal" catalog > numbers, but I'm not positive if I made that up or actually noticed someone > from 23andMe used that word :) > > As you probably noticed, AncestryDNA doesn't always present alleles in > alphabetical order. You will find instances of TC and CT, for example. > Illumina's base-calling software has something called a "top strand" and a > "bottom strand" (not the same thing as forward/reverse or plus/minus). > 23andMe does some post-processing to put alleles in alphabetical order. > Anyway, did you also look for TA? > > SNPs where the alternative alleles are also complementary base pairs in the > double helix ( A <-> T and C <-> G) are tricky to handle. 23andMe may have > developed custom probes to identify some of those. > > I've also noticed that AncestryDNA and FTDNA do not report any indels (the > I and D alleles you asked about). > > Tim, this may not be worth the effort to analyze, but I'm curious to know > if the "i" variants with rs numbers at FTDNA may be cases where 23andMe put > some additional probes on the chip for a particular locus. If you have a > list handy, I could explore that a bit. > > Ann Turner > > > > > > Ann Turner > > On Sat, Dec 12, 2015 at 11:41 PM, Tim Janzen via <genealogy-dna@rootsweb.com >> wrote: > >> Dear David, >> DD means a deletion and II means an insertion. The "i" SNPs in > the >> 23andMe files are those that don't have rs numbers assigned to them by >> 23andMe. It is possible that "i" stands for Illumina, but I am not > certain >> about that. It is also possible that it stands for "inserted", possibly >> because 23andMe inserted these SNPs onto the SNP chip because they were of >> special interest to 23andMe. Someone at 23andMe would know the answer to >> this question. >> It is interesting that AncestryDNA files don't have SNPs with the >> allele values AT. I don't have a definite answer for that. I checked my >> mom's file for the SNPs that have the allele values AT in 23andMe and > found >> a total of 322 of these SNPs. I then checked for these SNPs in my mom's >> AncestryDNA file and I couldn't find any of those SNPs in my mom's >> AncestryDNA file. My suspicion is that Ancestry.com has dropped all SNPs >> from their dataset with the values AT because they think that the results >> may be erroneous. >> Sincerely, >> Tim Janzen >> >> -----Original Message----- >> From: genealogy-dna-bounces@rootsweb.com >> [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder >> via >> Sent: Saturday, December 12, 2015 9:33 PM >> To: genealogy-dna@rootsweb.com >> Subject: [DNA] My Raw Data Files - Comparison 23andme vs AncestryDNA >> >> I have tested at both 23andme (V3) and AncestryDNA. I have written a >> program >> to add the raw data file information into a MySQL database, creating >> separate tables for my 23andme results and my AncestryDNA. >> >> I am trying to understand some things. >> >> I can understand all the A, C, G, T lettering. The single letters > represent >> SNPs on my Y and X chromosomes. I also understand that '--' is a no call. >> What are 'DD' and 'II'? >> >> >> I also found that AncestryDNA had no 'AT' SNPs for me, but 23andme had > 611: >> >> Can anyone explain why I have no 'AT' SNP pairs in my AncestryDNA raw data >> file? I verified this by browsing my Ancestry Raw data file. I had every >> other SNP pair represented. >> >> The final question is about RSIDs. What are the ones that begin with 'i' > in >> my 23andme raw data file? I have 10,709 RSIDs that begin with 'i-----'. >> >> David >> >> >> ------------------------------- >> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without >> the quotes in the subject and the body of the message >> > > > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
Thanks Andreas, I installed perl and MySQL on my Windows. I wrote a couple of perl scripts to extract raw data and add it to MySQL tables- one for 23andme, the other for AncestryDNA. I ran an 'update' SQL on each one. Next I have a perl script to extract the modified data from the database and put them back in the original raw data format. I zipped the modified raw data files and uploaded to gedmatch. I am not sure what the total impact of all this will be. I am hoping for more accurate one-to-many. Tables names: anc for AncestryDNA data; 23andme for 23andme data. Syntax of update SQL: Updates 23andme changing '--' in Pair to the value in AncestryDNA (fixed 6632): UPDATE 23andme INNER JOIN anc ON 23andme.RS = anc.RS SET 23andme.PAIR = anc.PAIR WHERE ( 23andme.RS = anc.RS) and 23andme.PAIR = '--'; Updates AncestryDNA changing '-- in PAIR to the value in 23andme (Fixed 6708): UPDATE anc INNER JOIN 23andme ON anc.RS = 23andme.RS SET anc.PAIR = 23andme.PAIR WHERE ( anc.RS = 23andme.RS) and anc.PAIR = '--'; It is pretty trivial once doing the hard work of making sure it all works. I would be happy to share with anyone who wants this. David -----Original Message----- From: Andreas West [mailto:ahnen@awest.de] Sent: Tuesday, December 15, 2015 2:07 AM To: David Schroeder; genealogy-dna@rootsweb.com Subject: Re: [DNA] My Raw Data Files - Comparison 23andme vs AncestryDNA That's very interesting and I thought about such myself (especially with now having tested at all 3 companies over almost 3 years). How did you "clean up" your no-calls, did you manually go through it? Or is that part of your program you wrote? Great post, David! Andreas > On 15 Dec 2015, at 11:47, David Schroeder via <genealogy-dna@rootsweb.com> wrote: > > I was able to 'fix' the no-calls for matching RSIDs on both Ancestry > and 23andme when one, or the other, was not a no-call. I fixed 6,632 > on 23andme and 6,708 on Ancestry. > > Interestingly enough, there were 3,833 that were left as no-calls on > both 23andme and AncestryDNA for the same RSIDs. I am wondering if > these are the result of particularly difficult locations to test, or > perhaps the SNP is rare in my genome? The tests were over two years apart. > > I uploaded both fixed raw data files to gedmatch to see how it may > affect my 'one-to-many' matches. (Will have to wait on the > processing). I ran the Gedmatch File Diagnostic Utility, and the fixed > files had significantly reduced my error rates. It seems that most of > my errors are in the X, Y or MT Chromosomes. > > David > > ------------------------------ > > Message: 4 > Date: Sun, 13 Dec 2015 03:31:45 -0800 > From: Ann Turner <dnacousins@gmail.com> > Subject: Re: [DNA] My Raw Data Files - Comparison 23andme vs > AncestryDNA > To: DNA Genealogy Mailing List <genealogy-dna@rootsweb.com> > Message-ID: > > <CAA-Ub_COJUEcMV4v3aXj4hbEaj6cbFf01AT9yDSBMJwDoyTnsA@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > I've always mentally thought about the "i" SNPs as "internal" catalog > numbers, but I'm not positive if I made that up or actually noticed > someone from 23andMe used that word :) > > As you probably noticed, AncestryDNA doesn't always present alleles in > alphabetical order. You will find instances of TC and CT, for example. > Illumina's base-calling software has something called a "top strand" > and a "bottom strand" (not the same thing as forward/reverse or plus/minus). > 23andMe does some post-processing to put alleles in alphabetical order. > Anyway, did you also look for TA? > > SNPs where the alternative alleles are also complementary base pairs > in the double helix ( A <-> T and C <-> G) are tricky to handle. > 23andMe may have developed custom probes to identify some of those. > > I've also noticed that AncestryDNA and FTDNA do not report any indels > (the I and D alleles you asked about). > > Tim, this may not be worth the effort to analyze, but I'm curious to > know if the "i" variants with rs numbers at FTDNA may be cases where > 23andMe put some additional probes on the chip for a particular locus. > If you have a list handy, I could explore that a bit. > > Ann Turner > > > > > > Ann Turner > > On Sat, Dec 12, 2015 at 11:41 PM, Tim Janzen via > <genealogy-dna@rootsweb.com >> wrote: > >> Dear David, >> DD means a deletion and II means an insertion. The "i" SNPs >> in > the >> 23andMe files are those that don't have rs numbers assigned to them >> by 23andMe. It is possible that "i" stands for Illumina, but I am >> not > certain >> about that. It is also possible that it stands for "inserted", >> possibly because 23andMe inserted these SNPs onto the SNP chip >> because they were of special interest to 23andMe. Someone at 23andMe >> would know the answer to this question. >> It is interesting that AncestryDNA files don't have SNPs with >> the allele values AT. I don't have a definite answer for that. I >> checked my mom's file for the SNPs that have the allele values AT in >> 23andMe and > found >> a total of 322 of these SNPs. I then checked for these SNPs in my >> mom's AncestryDNA file and I couldn't find any of those SNPs in my >> mom's AncestryDNA file. My suspicion is that Ancestry.com has >> dropped all SNPs from their dataset with the values AT because they >> think that the results may be erroneous. >> Sincerely, >> Tim Janzen >> >> -----Original Message----- >> From: genealogy-dna-bounces@rootsweb.com >> [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David >> Schroeder via >> Sent: Saturday, December 12, 2015 9:33 PM >> To: genealogy-dna@rootsweb.com >> Subject: [DNA] My Raw Data Files - Comparison 23andme vs AncestryDNA >> >> I have tested at both 23andme (V3) and AncestryDNA. I have written a >> program to add the raw data file information into a MySQL database, >> creating separate tables for my 23andme results and my AncestryDNA. >> >> I am trying to understand some things. >> >> I can understand all the A, C, G, T lettering. The single letters > represent >> SNPs on my Y and X chromosomes. I also understand that '--' is a no call. >> What are 'DD' and 'II'? >> >> >> I also found that AncestryDNA had no 'AT' SNPs for me, but 23andme >> had > 611: >> >> Can anyone explain why I have no 'AT' SNP pairs in my AncestryDNA raw >> data file? I verified this by browsing my Ancestry Raw data file. I >> had every other SNP pair represented. >> >> The final question is about RSIDs. What are the ones that begin with 'i' > in >> my 23andme raw data file? I have 10,709 RSIDs that begin with 'i-----'. >> >> David >> >> >> ------------------------------- >> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' >> without the quotes in the subject and the body of the message >> > > > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
Dear David, I noted that the errors and nocalls were similar between 23andMe data and Family Finder data a number of years ago when I analyzed the data. My conclusion was that errors and miscalls have to do with poorly performing primers on the Illumina OmniExpress chip, which is the SNP array that 23andMe, Ancestry.com, and Family Finder use for their tests. If the primer doesn't work well then companies using the same chip may well get the same erroneous results. The take home message is that just because three different companies (23andMe, Ancestry.com, and Family Finder) all give the same result for a SNP doesn't necessarily mean that the result is correct. If the primer gives a wrong result for whatever reason then you would need to figure this out by using a different testing technique such as Sanger sequencing or next generation sequencing. The way I figured out that some of my results on the OmniExpress chip were erroneous was to compare my data to my parents' data and then look for data that was incompatible. For instance, if my mother was GG and I was AT for a particular SNP then at least one of the allele values for my mother or for me had to be incorrect. Sincerely, Tim Janzen -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder via Sent: Monday, December 14, 2015 8:48 PM To: genealogy-dna@rootsweb.com Subject: Re: [DNA] My Raw Data Files - Comparison 23andme vs AncestryDNA Interestingly enough, there were 3,833 that were left as no-calls on both 23andme and AncestryDNA for the same RSIDs. I am wondering if these are the result of particularly difficult locations to test, or perhaps the SNP is rare in my genome? The tests were over two years apart. I uploaded both fixed raw data files to gedmatch to see how it may affect my 'one-to-many' matches. (Will have to wait on the processing). I ran the Gedmatch File Diagnostic Utility, and the fixed files had significantly reduced my error rates. It seems that most of my errors are in the X, Y or MT Chromosomes. David