Patti GEDmatch uses a standard threshold of 700 SNPs (in a row) and 7cM. Close to the 7cM threshold, roughly half of the Matches may be false. Shared segments at 7cM will be true (IBD) over 90% of the time; at a minimum of 15cM virtually all shared segments are IBD. I am a proponent of Triangulation. I have found that most of the false segments above 7cM will not Triangulate, so this is a good way to cull most of them out. See my blog for several posts on Triangulation. I believe AncestryDNA does not have a browser for privacy reasons. But many of their customers are uploading to GEDmatch to use the genetic tools. Jim - www.segmentology.org > On Jan 7, 2016, at 8:33 AM, Patti Easton via <genealogy-dna@rootsweb.com> wrote: > > Thank you for the reply Ann. So when looking at results from Ancestry > uploaded to gedmatch, does that mean that the chromosome browser comparisons > would be faulty? How much error should one expect? Is this just minutia, > or would it factor into comparisons? Is this why Ancestry has no chromosome > browser? > > Thank you again, > Patti > >
Patti Another way I try to explain this is that the testing process gets a value for each of the 700,000 SNPs on your mother's side and your father's side - so actually 1,400,000 results in all. The process now have does not know which ones are maternal and which are paternal - it just knows there are two values for each SNP. The presumption is that for a long enough comparison (between you and a potential Match), the only way two long strings of SNPs could be identical is if they were from the same ancestor. I needed several different explanations of this process before the "lightbulb" came on for me. In my blog (in my tag line), an early post explained the difference between ancestral segments (what you have in your body - from ancestors) and shared segments (generated by computer algorithms looking at all the SNPs). When a shared segment matches an ancestral segment we say it is IBD (Identical By Descent); when the shared segment is not the same as an ancestral segment the shared segment is false, and we say it is not-IBD or it is IBS (Identical By State) or IBC (Identical By Chance) [different folks use different terms] Hope this helps. Jim - www.segmentology.org > On Jan 7, 2016, at 7:50 AM, Ann Turner via <genealogy-dna@rootsweb.com> wrote: > > You are correct about the biological reality. Segments do not jump back and > forth. However, the DNA results (e.g. CT) you receive for a particular SNP > are a "genotype." The DNA is chopped into tiny pieces before analysis, > which uses probes to detect the presence of a particular allele. The only > thing we can tell is that a C is present in some pieces and a T is present > in other pieces. Computer algorithms attempt to predict which SNPs are > traveling together as a package on a single chromosome (a haplotype) but > they don't always get it right. > > Ann Turner > >> On Thu, Jan 7, 2016 at 7:23 AM, Patti Easton <amharach@msn.com> wrote: >> >> Ann, >> >> I really appreciated this article. I watch this list as an eager novice, >> knowing I have a steep learning curve, and having gained much knowledge >> from >> the talented people who contribute here. Hopefully this is appropriate to >> ask here, again given my novice status. >> >> What I have trouble wrapping my head around in this article, particularly >> given the DNA doesn't lie, DNA is a fact and end all, be all, is how does >> Germline allow matches to jump back and forth between segments? Isn't this >> counter intuitive to something directed at extracting genetic code? And >> without asking too much of the scientific end, how is that even possible. >> How can DNA move to other sections? >> >> Perhaps DNA isn't as black and white as I have been led to view it. I >> don't >> see it as fluid. But while coding appears to not fluctuate, I guess I am >> seeing that perhaps the testing makes it so. >> >> Thank you! >> Regards, >> Patti Easton >> >> >> >> >> -----Original Message----- >> From: genealogy-dna-bounces@rootsweb.com >> [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Ann Turner via >> Sent: Wednesday, January 6, 2016 8:05 PM >> To: DNA Genealogy Mailing List <GENEALOGY-DNA@rootsweb.com> >> Subject: [DNA] segments and cM at AncestryDNA >> >> This blog post explains some of the questions we've raised about the number >> of shared cM and segments at AncestryDNA and how they compare to numbers >> seen at GEDmatch, 23andMe, and FTDNA. >> >> >> http://blogs.ancestry.com/techroots/behind-the-new-ancestrydna-feature-amoun >> t-of-shared-dna/ >> >> Ann Turner >> >> ------------------------------- >> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the >> quotes in the subject and the body of the message > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
David, this is no mystery, you yourself present the explanation. Ancestry has LESS SNPs along the pileup - 460 compared to 612 of 23andMe. This means you threshold of 5cM and !! 500 SNPs !! is too strict for pile-up only match of the A kit. It has to be supported by at least 40 cumulative SNP of IBS regions on both sides of the pileup. Whereas 23andMe can easily satisfy the 500 SNP threshold you set for your experiments. Setting the threshold to 400 SNPs should bring the number of pile-up matches for A kit up, while setting it to 700 SNPs should bring it down for the M kit to verify this theory. Best regards, Kuba On Wed, Jan 6, 2016 at 9:37 PM, David Schroeder via < genealogy-dna@rootsweb.com> wrote: > For comparison for the matches I had on Chromosome = '15' and (POS > > 23976155 AND POS < 25855576) My pileup region > > +-----------+---------+ > | Fixed ANC | AVG(CM) | > +-----------+---------+ > | 119 | 10.2 | > +-----------+---------+ > +----------+---------+ > | Orig ANC | AVG(CM) | > +----------+---------+ > | 150 | 10.0 | > +----------+---------+ > +---------------+---------+ > | Fixed 23andme | AVG(CM) | > +---------------+---------+ > | 442 | 8.6 | > +---------------+---------+ > +--------------+---------+ > | Orig 23andme | AVG(CM) | > +--------------+---------+ > | 556 | 8.3 | > +--------------+---------+ > > It is a mystery to me why 23andme generated more matches in my pileup > region. > > David Schroeder > > -----Original Message----- > From: David Schroeder [mailto:dschroed991@sbcglobal.net] > Sent: Wednesday, January 6, 2016 2:18 PM > To: 'ahnen@awest.de'; 'genealogy-dna@rootsweb.com' > Subject: RE: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA > > Sorry about the sea of information. There were so many comparisons, and > each > one had surprising results. > > I think I can say at least 25% of my matches coming from 23andme were from > a > Pile-up on Chromosome 15: POS > 23976155 and POS < 25855576). > > I looked at the two kits, Ancestry checks for 460 RSIDs in this range of > positions. 23andme checks for 602. The pileups for Ancestry are much less > than 23andme. Ancestry only had 12 no-calls in this region, 23andme only > had > 17. I don't know why there were hundreds more matches in this range for > 23andme compared to Ancestry. > > David > > > > -----Original Message----- > From: ahnen@awest.de [mailto:ahnen@awest.de] > Sent: Wednesday, January 6, 2016 11:09 AM > To: David Schroeder; genealogy-dna@rootsweb.com > Subject: Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA > > David, > > A summary on the last comparisons would be fine. I've got lost in the many > tables which despite your wonderful effort are still hard to read in plain > email format. > > Great analysis! > > Andreas > > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message >
David, This is a very interesting study on the differences in raw data. Thank you for sharing. However, the context is that the matching algorithm used was GEDmatch, which is somewhat different than the algorithms used at 23andMe and GEDmatch. So the analysis is based on GEDmatch alone. [I'm copying GEDmatch on this, too] It is surprising to me how much variance there is. In my experience, it appears the 23andMe, FTDNA and GEDmatch Matches and segments align pretty well. Two thoughts: 1. Try the comparisons again using the GEDmatch standard 700 SNPs and 7cM. I used that standard for several years. Last year (after I had formed TGs covering about 85% of my 45 chromosomes), I lowered the threshold to 500/5 and found that about 95% of the 5-7cM segments did not Triangulate and were IBS. I'd also be curious about the comparisons using a 1,000/10 threshold. 2. I did a quick comparison of GEDmatch Matches and segments on a trio of files at FTDNA and also GEDmatch. FTDNA had many false negatives, which showed up as positive Matches at GEDmatch. This highlighted the conservative FTDNA algorithm. We have lots of anecdotal data that AncestryDNA's algorithm also has many, sometimes significant, false negatives (real matches that Ancestry does not report). I was going to suggest a comparison of your GEDmatch results with the vendor results, but now realize that notion is folly (we don't know the full set of Matches at 23andMe, and AncestryDNA has most Matches with code names making both of these comparisons impossible). Only FTDNA gives us the full list. Anyway, I'd like to see how much difference the threshold makes - I think there are many spurious Matches at 5-7cM. Jim - www.segmentology.org > On Jan 6, 2016, at 11:22 AM, David Schroeder via <genealogy-dna@rootsweb.com> wrote: > > This is a continuation of my observations comparing raw data files from > 23andme and AncestryDNA. > > Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded > the raw data files from each one. These two raw datas I will call > 'Original'. Using SQL I loaded these files to a database. I also converted > the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on > both of them using matching RSIDs common to both when one vendor had a > no-call, and the other vendor had made a call on the values. Several > thousand no-calls were fixed this way reducing the error rate by about a > third I extracted these database tables into new raw data files that I call > 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 > original and 2 fixed. I then used Gedmatch's utility, "Matching Segment > Search" and extracted matches for all 4 individually using a Segment Length > of 5 cM. I uploaded each of the matches into a database table so I could > compare. > > > All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA > kits by these amounts: > > +-----------------+--------+--------+ > | 23 Orig Matches | Vendor | Avg cM | > +-----------------+--------+--------+ > | 696 | A | 7.6 | > | 329 | F | 7.3 | > | 933 | M | 7.8 | > +-----------------+--------+--------+ > +---------------+----------------------+ > | Total Matches | Original 23andme kit | > +---------------+----------------------+ > | 1958 | M080859 | > +---------------+----------------------+ > > +------------------+--------+--------+ > | 23 Fixed Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 779 | A | 7.6 | > | 257 | F | 7.6 | > | 715 | M | 8.1 | > +------------------+--------+--------+ > +---------------+-------------------+ > | Total Matches | Fixed 23andme kit | > +---------------+-------------------+ > | 1751 | M306764 | > +---------------+-------------------+ > > There are 207 fewer matches on fixed 23andme data kits > ####################################################### > +------------------+--------+--------+ > | ANC Orig Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 946 | A | 7.3 | > | 335 | F | 7.2 | > | 316 | M | 8.8 | > +------------------+--------+--------+ > +---------------+------------------+ > | Total Matches | Original ANC kit | > +---------------+------------------+ > | 1597 | A934219 | > +---------------+------------------+ > > +-------------------+--------+--------+ > | ANC Fixed Matches | Vendor | Avg cM | > +-------------------+--------+--------+ > | 799 | A | 7.6 | > | 238 | F | 7.7 | > | 223 | M | 9.8 | > +-------------------+--------+--------+ > +---------------+---------------+ > | Total Matches | Fixed ANC kit | > +---------------+---------------+ > | 1260 | A146269 | > +---------------+---------------+ > > There are 327 fewer matches on fixed ancestrydna kits. This was somewhat > similar in magnitude in reductions for the 23andme fixed vs original > kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs > fixed ancestrydna > > > > My thinking about fewer matches on the fixed kits were due to false > positives were being eliminated because many no-calls getting fixed, > especially ones with opposite homozygous alleles. They would share basically > the same kits- other than the ones that were eliminated in the fixed. But > looking at each kit individually it is much more than that. > > I compared 23andme kits- fixed vs original and vice versa- for kits not > present in one or the other. Amazing to me, it was not a matter of the > fixed kit having matches, but there were entirely different matches in each > kit: > > kits that are in the 23andme original but are not in ancestry original; > +-------------------------+--------+--------+ > | 23 Orig not in ANC Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 86 | A | 6.9 | > | 85 | F | 6.8 | > | 709 | M | 7.3 | > +-------------------------+--------+--------+ > > Kits that are in ancestry original but are not in 23andme original; > +---------------------------------+--------+--------+ > | ANC original not in 23 original | Vendor | AVG cM | > +---------------------------------+--------+--------+ > | 337 | A | 6.6 | > | 94 | F | 6.7 | > | 105 | M | 7.0 | > +---------------------------------+--------+--------+ > > kits that are in the ancestry original but are not in 23andme fixed; > +--------------------------+--------+--------+ > | ANC Orig not in 23 fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 229 | A | 6.6 | > | 130 | F | 6.5 | > | 139 | M | 6.9 | > +--------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 original; > +-------------------------+--------+--------+ > | 23 Fixed Not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 original that have no matches in 23 fixed; > +-------------------------+--------+--------+ > | 23 Orig not in 23 fixed | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 202 | A | 6.6 | > | 139 | F | 6.5 | > | 263 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 orig; > +-------------------------+--------+--------+ > | 23 fixed not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits that are in the Ancestry fixed but are not in Ancestry original; > +---------------------------+--------+--------+ > | ANC fixed not in ANC Orig | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 48 | A | 7.8 | > | 17 | F | 8.0 | > | 11 | M | 8.7 | > +---------------------------+--------+--------+ > > kits that are in the Ancestry original but are not in Ancestry fixed; > +---------------------------+--------+--------+ > | ANC Orig not in ANC Fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 200 | A | 6.6 | > | 114 | F | 6.4 | > | 106 | M | 6.6 | > +---------------------------+--------+--------+ > > kits that are in the 23andme fixed but are not in ancestry fixed; > +---------------------------+--------+--------+ > | 23 fixed not in ANC fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 56 | A | 6.6 | > | 35 | F | 7.0 | > | 542 | M | 7.4 | > +---------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme fixed; > +---------------------------+--------+--------+ > | ANC fixed not in 23 fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 71 | A | 6.9 | > | 19 | F | 7.0 | > | 58 | M | 7.5 | > +---------------------------+--------+--------+ > > kits that are in the 23andme original but are not in ancestry fixed; > +--------------------------+--------+--------+ > | 23 Orig not in ANC fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 218 | A | 6.6 | > | 160 | F | 6.5 | > | 763 | M | 7.2 | > +--------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme original; > +--------------------------+--------+--------+ > | ANC fixed not in 23 Orig | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 315 | A | 6.7 | > | 71 | F | 7.0 | > | 59 | M | 7.4 | > +--------------------------+--------+--------+ > > ################################################################### > Maybe, too much information here, but I wanted to show each comparison. > > Disclosure: I have a pile-up region on chromosome 15 where I have hundreds > of bogus matches. I did not factor that out. > > I invite anyone's comments about what may be going on. And any suggestions > about further queries. > > David Schroeder >
You are correct about the biological reality. Segments do not jump back and forth. However, the DNA results (e.g. CT) you receive for a particular SNP are a "genotype." The DNA is chopped into tiny pieces before analysis, which uses probes to detect the presence of a particular allele. The only thing we can tell is that a C is present in some pieces and a T is present in other pieces. Computer algorithms attempt to predict which SNPs are traveling together as a package on a single chromosome (a haplotype) but they don't always get it right. Ann Turner On Thu, Jan 7, 2016 at 7:23 AM, Patti Easton <amharach@msn.com> wrote: > Ann, > > I really appreciated this article. I watch this list as an eager novice, > knowing I have a steep learning curve, and having gained much knowledge > from > the talented people who contribute here. Hopefully this is appropriate to > ask here, again given my novice status. > > What I have trouble wrapping my head around in this article, particularly > given the DNA doesn't lie, DNA is a fact and end all, be all, is how does > Germline allow matches to jump back and forth between segments? Isn't this > counter intuitive to something directed at extracting genetic code? And > without asking too much of the scientific end, how is that even possible. > How can DNA move to other sections? > > Perhaps DNA isn't as black and white as I have been led to view it. I > don't > see it as fluid. But while coding appears to not fluctuate, I guess I am > seeing that perhaps the testing makes it so. > > Thank you! > Regards, > Patti Easton > > > > > -----Original Message----- > From: genealogy-dna-bounces@rootsweb.com > [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Ann Turner via > Sent: Wednesday, January 6, 2016 8:05 PM > To: DNA Genealogy Mailing List <GENEALOGY-DNA@rootsweb.com> > Subject: [DNA] segments and cM at AncestryDNA > > This blog post explains some of the questions we've raised about the number > of shared cM and segments at AncestryDNA and how they compare to numbers > seen at GEDmatch, 23andMe, and FTDNA. > > > http://blogs.ancestry.com/techroots/behind-the-new-ancestrydna-feature-amoun > t-of-shared-dna/ > > Ann Turner > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the > quotes in the subject and the body of the message >
FWIW, your original no-call rate seems substantially higher than I'm accustomed to seeing. For example: An M kit: 0.15566093841309 percent. An A kit: 0.06391337857398 percent. The fact that you previously said several thousand no-call RSIDs were the same for both companies gives rise to a wild speculation in my mind, viz is it possible that you are a chimera? This is the "vanishing twin" scenario, where the fertilized eggs that would ordinarily give rise to fraternal twins fuse early in development. This could affect the base-calling algorithm, which expects a double-strength signal for homozygous results and a single strength signal for each allele in heterozygous results. If you were a chimera, you might see an extra strong signal for an allele carried by both of the original twins, while the heterozygous signal would be weaker and perhaps rejected by the base-calling algorithm. Ann Turner On Wed, Jan 6, 2016 at 12:02 PM, David Schroeder via < genealogy-dna@rootsweb.com> wrote: > Error Rates (No-Calls): > Original Ancestry: 1.9444416327699 percent Fixed Ancestry: > 0.99817658890539 percent > Original 23andme: 2.1583132550279 percent Fixed 23andme: > 1.482448214704 percent >
Thank you for the reply Ann. So when looking at results from Ancestry uploaded to gedmatch, does that mean that the chromosome browser comparisons would be faulty? How much error should one expect? Is this just minutia, or would it factor into comparisons? Is this why Ancestry has no chromosome browser? Thank you again, Patti -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Ann Turner via Sent: Thursday, January 7, 2016 5:51 AM Cc: DNA Genealogy Mailing List <genealogy-dna@rootsweb.com> Subject: Re: [DNA] segments and cM at AncestryDNA You are correct about the biological reality. Segments do not jump back and forth. However, the DNA results (e.g. CT) you receive for a particular SNP are a "genotype." The DNA is chopped into tiny pieces before analysis, which uses probes to detect the presence of a particular allele. The only thing we can tell is that a C is present in some pieces and a T is present in other pieces. Computer algorithms attempt to predict which SNPs are traveling together as a package on a single chromosome (a haplotype) but they don't always get it right. Ann Turner On Thu, Jan 7, 2016 at 7:23 AM, Patti Easton <amharach@msn.com> wrote: > Ann, > > I really appreciated this article. I watch this list as an eager > novice, knowing I have a steep learning curve, and having gained much > knowledge from the talented people who contribute here. Hopefully > this is appropriate to ask here, again given my novice status. > > What I have trouble wrapping my head around in this article, > particularly given the DNA doesn't lie, DNA is a fact and end all, be > all, is how does Germline allow matches to jump back and forth between > segments? Isn't this counter intuitive to something directed at > extracting genetic code? And without asking too much of the scientific end, how is that even possible. > How can DNA move to other sections? > > Perhaps DNA isn't as black and white as I have been led to view it. I > don't see it as fluid. But while coding appears to not fluctuate, I > guess I am seeing that perhaps the testing makes it so. > > Thank you! > Regards, > Patti Easton > > > > > -----Original Message----- > From: genealogy-dna-bounces@rootsweb.com > [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Ann Turner > via > Sent: Wednesday, January 6, 2016 8:05 PM > To: DNA Genealogy Mailing List <GENEALOGY-DNA@rootsweb.com> > Subject: [DNA] segments and cM at AncestryDNA > > This blog post explains some of the questions we've raised about the > number of shared cM and segments at AncestryDNA and how they compare > to numbers seen at GEDmatch, 23andMe, and FTDNA. > > > http://blogs.ancestry.com/techroots/behind-the-new-ancestrydna-feature > -amoun > t-of-shared-dna/ > > Ann Turner > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message > ------------------------------- To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
Ann, I really appreciated this article. I watch this list as an eager novice, knowing I have a steep learning curve, and having gained much knowledge from the talented people who contribute here. Hopefully this is appropriate to ask here, again given my novice status. What I have trouble wrapping my head around in this article, particularly given the DNA doesn't lie, DNA is a fact and end all, be all, is how does Germline allow matches to jump back and forth between segments? Isn't this counter intuitive to something directed at extracting genetic code? And without asking too much of the scientific end, how is that even possible. How can DNA move to other sections? Perhaps DNA isn't as black and white as I have been led to view it. I don't see it as fluid. But while coding appears to not fluctuate, I guess I am seeing that perhaps the testing makes it so. Thank you! Regards, Patti Easton -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Ann Turner via Sent: Wednesday, January 6, 2016 8:05 PM To: DNA Genealogy Mailing List <GENEALOGY-DNA@rootsweb.com> Subject: [DNA] segments and cM at AncestryDNA This blog post explains some of the questions we've raised about the number of shared cM and segments at AncestryDNA and how they compare to numbers seen at GEDmatch, 23andMe, and FTDNA. http://blogs.ancestry.com/techroots/behind-the-new-ancestrydna-feature-amoun t-of-shared-dna/ Ann Turner ------------------------------- To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
Does the final column in the VCF file start with three characters like these? 0/0 homozygous for the REF allele 0/1 heterozygous 1/1 homozygous for the ALT allele Ann Turner On Thu, Jan 7, 2016 at 4:12 AM, Iain Kennedy via <genealogy-dna@rootsweb.com > wrote: > I am playing around with what can be done with this data, I was > particularly interested to see how far I could get mocking up an > AncestryDNA type file from the data. However I think 2x is too low to get > far except as a proof of concept. I have loaded the dbSNP VCF into mySQL > but it doesn't seem to follow the normal VCF standard which is a shame (it > isn't using '.' in the ALT column when the calls were all ancestral). I > also had a go using mpileup and varscan overnight but this requires 8x > reads minimum: > > C:\genealogy\software>samtools mpileup -f human_g1k_v37.fasta GWK3W.bam | > java -jar VarScan.v2.3.7.jar pileup2snp > gwk3w_vs.vcf > [mpileup] 1 samples in 1 input files > <mpileup> Set max per-file depth to 8000 > Warning: No p-value threshold provided, so p-values will not be calculated > Min coverage: 8 > Min reads2: 2 > Min var freq: 0.01 > Min avg qual: 15 > P-value thresh: 0.99 > Input stream not ready, waiting for 5 seconds... > Reading input from STDIN > 2349372654 bases in pileup file > 8008795 met minimum coverage of 8x > 2905740 SNPs predicted > > I haven't loaded it into the db yet. > > The dbSNP VCF I refer to includes ancestral and derived calls and mine has > 107M rows. > > Iain > > > > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message >
David, A summary on the last comparisons would be fine. I've got lost in the many tables which despite your wonderful effort are still hard to read in plain email format. Great analysis! Andreas > On Jan 6, 2016, at 23:22, David Schroeder via <genealogy-dna@rootsweb.com> wrote: > > This is a continuation of my observations comparing raw data files from > 23andme and AncestryDNA. > > Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded > the raw data files from each one. These two raw datas I will call > 'Original'. Using SQL I loaded these files to a database. I also converted > the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on > both of them using matching RSIDs common to both when one vendor had a > no-call, and the other vendor had made a call on the values. Several > thousand no-calls were fixed this way reducing the error rate by about a > third I extracted these database tables into new raw data files that I call > 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 > original and 2 fixed. I then used Gedmatch's utility, "Matching Segment > Search" and extracted matches for all 4 individually using a Segment Length > of 5 cM. I uploaded each of the matches into a database table so I could > compare. > > > All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA > kits by these amounts: > > +-----------------+--------+--------+ > | 23 Orig Matches | Vendor | Avg cM | > +-----------------+--------+--------+ > | 696 | A | 7.6 | > | 329 | F | 7.3 | > | 933 | M | 7.8 | > +-----------------+--------+--------+ > +---------------+----------------------+ > | Total Matches | Original 23andme kit | > +---------------+----------------------+ > | 1958 | M080859 | > +---------------+----------------------+ > > +------------------+--------+--------+ > | 23 Fixed Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 779 | A | 7.6 | > | 257 | F | 7.6 | > | 715 | M | 8.1 | > +------------------+--------+--------+ > +---------------+-------------------+ > | Total Matches | Fixed 23andme kit | > +---------------+-------------------+ > | 1751 | M306764 | > +---------------+-------------------+ > > There are 207 fewer matches on fixed 23andme data kits > ####################################################### > +------------------+--------+--------+ > | ANC Orig Matches | Vendor | Avg cM | > +------------------+--------+--------+ > | 946 | A | 7.3 | > | 335 | F | 7.2 | > | 316 | M | 8.8 | > +------------------+--------+--------+ > +---------------+------------------+ > | Total Matches | Original ANC kit | > +---------------+------------------+ > | 1597 | A934219 | > +---------------+------------------+ > > +-------------------+--------+--------+ > | ANC Fixed Matches | Vendor | Avg cM | > +-------------------+--------+--------+ > | 799 | A | 7.6 | > | 238 | F | 7.7 | > | 223 | M | 9.8 | > +-------------------+--------+--------+ > +---------------+---------------+ > | Total Matches | Fixed ANC kit | > +---------------+---------------+ > | 1260 | A146269 | > +---------------+---------------+ > > There are 327 fewer matches on fixed ancestrydna kits. This was somewhat > similar in magnitude in reductions for the 23andme fixed vs original > kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs > fixed ancestrydna > > > > My thinking about fewer matches on the fixed kits were due to false > positives were being eliminated because many no-calls getting fixed, > especially ones with opposite homozygous alleles. They would share basically > the same kits- other than the ones that were eliminated in the fixed. But > looking at each kit individually it is much more than that. > > I compared 23andme kits- fixed vs original and vice versa- for kits not > present in one or the other. Amazing to me, it was not a matter of the > fixed kit having matches, but there were entirely different matches in each > kit: > > kits that are in the 23andme original but are not in ancestry original; > +-------------------------+--------+--------+ > | 23 Orig not in ANC Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 86 | A | 6.9 | > | 85 | F | 6.8 | > | 709 | M | 7.3 | > +-------------------------+--------+--------+ > > Kits that are in ancestry original but are not in 23andme original; > +---------------------------------+--------+--------+ > | ANC original not in 23 original | Vendor | AVG cM | > +---------------------------------+--------+--------+ > | 337 | A | 6.6 | > | 94 | F | 6.7 | > | 105 | M | 7.0 | > +---------------------------------+--------+--------+ > > kits that are in the ancestry original but are not in 23andme fixed; > +--------------------------+--------+--------+ > | ANC Orig not in 23 fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 229 | A | 6.6 | > | 130 | F | 6.5 | > | 139 | M | 6.9 | > +--------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 original; > +-------------------------+--------+--------+ > | 23 Fixed Not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 original that have no matches in 23 fixed; > +-------------------------+--------+--------+ > | 23 Orig not in 23 fixed | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 202 | A | 6.6 | > | 139 | F | 6.5 | > | 263 | M | 6.9 | > +-------------------------+--------+--------+ > > kits in 23 fixed that have no matches in 23 orig; > +-------------------------+--------+--------+ > | 23 fixed not in 23 Orig | Vendor | AVG cM | > +-------------------------+--------+--------+ > | 281 | A | 6.7 | > | 67 | F | 7.1 | > | 48 | M | 6.9 | > +-------------------------+--------+--------+ > > kits that are in the Ancestry fixed but are not in Ancestry original; > +---------------------------+--------+--------+ > | ANC fixed not in ANC Orig | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 48 | A | 7.8 | > | 17 | F | 8.0 | > | 11 | M | 8.7 | > +---------------------------+--------+--------+ > > kits that are in the Ancestry original but are not in Ancestry fixed; > +---------------------------+--------+--------+ > | ANC Orig not in ANC Fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 200 | A | 6.6 | > | 114 | F | 6.4 | > | 106 | M | 6.6 | > +---------------------------+--------+--------+ > > kits that are in the 23andme fixed but are not in ancestry fixed; > +---------------------------+--------+--------+ > | 23 fixed not in ANC fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 56 | A | 6.6 | > | 35 | F | 7.0 | > | 542 | M | 7.4 | > +---------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme fixed; > +---------------------------+--------+--------+ > | ANC fixed not in 23 fixed | Vendor | AVG cM | > +---------------------------+--------+--------+ > | 71 | A | 6.9 | > | 19 | F | 7.0 | > | 58 | M | 7.5 | > +---------------------------+--------+--------+ > > kits that are in the 23andme original but are not in ancestry fixed; > +--------------------------+--------+--------+ > | 23 Orig not in ANC fixed | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 218 | A | 6.6 | > | 160 | F | 6.5 | > | 763 | M | 7.2 | > +--------------------------+--------+--------+ > > kits that are in the ancestry fixed but are not in 23andme original; > +--------------------------+--------+--------+ > | ANC fixed not in 23 Orig | Vendor | AVG cM | > +--------------------------+--------+--------+ > | 315 | A | 6.7 | > | 71 | F | 7.0 | > | 59 | M | 7.4 | > +--------------------------+--------+--------+ > > ################################################################### > Maybe, too much information here, but I wanted to show each comparison. > > Disclosure: I have a pile-up region on chromosome 15 where I have hundreds > of bogus matches. I did not factor that out. > > I invite anyone's comments about what may be going on. And any suggestions > about further queries. > > David Schroeder > > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
This blog post explains some of the questions we've raised about the number of shared cM and segments at AncestryDNA and how they compare to numbers seen at GEDmatch, 23andMe, and FTDNA. http://blogs.ancestry.com/techroots/behind-the-new-ancestrydna-feature-amount-of-shared-dna/ Ann Turner
I taught DNA.Land's imputation VCF of 23andMe data covers only recombing genome parts (autosomal/X)? Please let us know about stats and comparison of WGS 3X with BigY. Good research. On Wed, Jan 6, 2016 at 8:41 AM, Atanas Kumbarov via < genealogy-dna@rootsweb.com> wrote: > I will be more than happy to examine your WGS 3x data together with your > Big Y data. > > Best regards, > Atanas Kumbarov > > On 06/01/16 01:28, steven perkins via wrote: > > All: > > > > I have the Big Y, a Full Genomes WGS at 3X, and the imputed VCF from my > V3 > > 23andMe results at DNA.Land. Who can compare the Y DNA results in all > three > > files? I'd like to have it compared to another R1a with similar result > > files. > > > > After seeing the numbers in the post by Justin quoted by Tim, I will be > > increasing the coverage of the WGS file from Full Genomes. > > > > Steven > > > > > > > > On Tue, Jan 5, 2016 at 7:07 PM, Thomas Krahn via < > genealogy-dna@rootsweb.com > >> wrote: > >> Atanas, > >> > >> The largest cost factor for Y chromosome tests is not the sequencing, > >> but the enrichment procedure. > >> With enrichment the price will never drop far below $400, so you might > >> as well sequence with sufficient coverage. > >> > >> On the long run there is no future for BigY and FGCs Y tests since the > >> WGS tests keep dropping in the price, but Y tests don't. > >> > >> Thomas > >> > >> > >> On 01/06/2016 12:55 AM, Atanas Kumbarov via wrote: > >>> I have exactly the same idea. If you have a tree to follow, you don't > >>> need that many reads at a given position. But I think that it will be > >>> better if FGC designs a Y-chromosome only test at 5x instead of WGS > test > >>> at 2x with the same price tag (around $250). The increase of depth of > >>> coverage could be compensated by the decrease of sequenced locis (the > >>> Y-chromosome only). > >>> > >>> 5x is sufficient - the 1000GP had 5x to 7x coverage and never the less > >>> we see huge 1000GP dominated segments of the YFull tree like this one > >>> http://yfull.com/tree/R-L657/ > >>> > >>> If such a test was offered, it could gain a lot of popularity. > >>> > >>> Best regards, > >>> Atanas Kumbarov > >>> > >>> > >>> > >>> On 05/01/16 23:37, AJ Marsh via wrote: > >>>> Tim, > >>>> > >>>> Many thanks for posting this. > >>>> > >>>> I have been waiting for some time to see what sort of results were > >> coming from 2x coverage. I am a little bit encouraged by the figures > from > >> FGC. > >>>> For the R-L617 project we have 20+ participants who have tested YElite > >> or BigY. So we have a good tree starting to take shape of known SNPs > below > >> L617. > >>>> If the 2x coverage test "covers" about 14,000,000 bases of Y, it might > >> be too thinly covered to "confidently" call SNPs, as many of these bases > >> will only be covered once. But in my case, I have a "work in progress" > >> tree for mutations below L617, so reading the 2x coverage raw data might > >> enable me to pick up mutations which are reported in raw data, but not > >> reported more than once to enable FGC to call them confidently as a new > SNP. > >>>> My feeling is that in my case, where I have the advantage of a working > >> SNP tree to compare to, it might be cost effective for me to extract > >> benefit from the 2x coverage test. The area covered at least once by a > 2x > >> coverage test is more than half the area covered by YElite it seems. > So if > >> I can see matches to known L617 downstream and can call them, even at > only > >> 50% certainty, I should be able to tentatively place L617s into known > >> branches. > >>>> Several branches have 20+ known SNPS, so I only need one reliable > call, > >> or several less reliable calls of some of the 20+ branch SNPs to assign > a > >> tester to a branch, at least tentatively. Often there is already other > >> forms of evidence pointing to a branch, so if it is corroborated to a > >> degree by data from 2x coverage tests, it is a step forward. If in > doubt, > >> a SNP could be verified from single SNP testing. > >>>> I think that I just need to have a few L617s tested on the 2x test to > >> see how it works out in practice in my case. I feel optimistic. > Although > >> I am working on a SNP tree below L617, several haplogroup projects have > >> similar type SNP trees building, so they might also get benefit. Cost > of > >> testing is always a challenge, so even if a 2x test is cutting corners a > >> bit, it still might enable test to be done which otherwise might not be > >> possible. > >>>> John. > >>>> > >>>> Sent from my iPad > >>>> > >>>>> On 6/01/2016, at 9:32 am, Tim Janzen via <genealogy-dna@rootsweb.com > > > >> wrote: > >>>>> Dear Atanas, > >>>>> Justin Loe from FullGenomes just posted the information below > on > >>>>> another list: > >>>>> > >>>>> "As mentioned earlier, these are our beta results for 1x coverage: > >>>>> > >>>>> Mapped Y coverage > >>>>> 30x 22,856,938 > >>>>> 10x 22,025,697 > >>>>> 4x 17,678,170 > >>>>> 2x 13,755,442 > >>>>> > >>>>> Average Callable Loci > >>>>> 30x 14,558,001 > >>>>> 10x 8,046,540 > >>>>> 4x 1,050,996 > >>>>> 2x 349,397 > >>>>> > >>>>> Y Elite 2.0: > >>>>> 14,000,000 Callable Loci approximately on average" > >>>>> > >>>>> This helps provide clearer data for your last question than I was > able > >> to do > >>>>> in my earlier response. It thus appears that one needs to order at > >> least > >>>>> the 30x whole genome sequence from FullGenomes to get the same number > >> of > >>>>> callable Y chromosome loci as you can get from the Y Elite 2.0 test. > >>>>> Sincerely, > >>>>> Tim Janzen > >>>>> > >>>>> > >>>>> -----Original Message----- > >>>>> From: genealogy-dna-bounces@rootsweb.com > >>>>> [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Atanas > >> Kumbarov via > >>>>> Sent: Tuesday, January 5, 2016 8:59 AM > >>>>> To: genealogy-dna@rootsweb.com > >>>>> Subject: [DNA] Full Genomes Corporation tests > >>>>> > >>>>> I have several questions about their tests: > >>>>> > >>>>> 1. I wonder how useful 2x results can be? > >>>>> 2. What can be done with a FGS? > >>>>> 3. Is it possible to extract usable data for GedMatch from FGS *in an > >> easy > >>>>> way*? > >>>>> 4. How well is the Y-chromosome covered? > >>>>> > >>>>> Best regards, > >>>>> Atanas Kumbarov > >>>>> > >>>>> > >>>>> ------------------------------- > >>>>> To unsubscribe from the list, please send an email to > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > >> the quotes in the subject and the body of the message > >>>> ------------------------------- > >>>> To unsubscribe from the list, please send an email to > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > >> the quotes in the subject and the body of the message > >>> ------------------------------- > >>> To unsubscribe from the list, please send an email to > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > >> the quotes in the subject and the body of the message > >> > >> > >> ------------------------------- > >> To unsubscribe from the list, please send an email to > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > >> the quotes in the subject and the body of the message > >> > > > > > > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message >
For comparison for the matches I had on Chromosome = '15' and (POS > 23976155 AND POS < 25855576) My pileup region +-----------+---------+ | Fixed ANC | AVG(CM) | +-----------+---------+ | 119 | 10.2 | +-----------+---------+ +----------+---------+ | Orig ANC | AVG(CM) | +----------+---------+ | 150 | 10.0 | +----------+---------+ +---------------+---------+ | Fixed 23andme | AVG(CM) | +---------------+---------+ | 442 | 8.6 | +---------------+---------+ +--------------+---------+ | Orig 23andme | AVG(CM) | +--------------+---------+ | 556 | 8.3 | +--------------+---------+ It is a mystery to me why 23andme generated more matches in my pileup region. David Schroeder -----Original Message----- From: David Schroeder [mailto:dschroed991@sbcglobal.net] Sent: Wednesday, January 6, 2016 2:18 PM To: 'ahnen@awest.de'; 'genealogy-dna@rootsweb.com' Subject: RE: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA Sorry about the sea of information. There were so many comparisons, and each one had surprising results. I think I can say at least 25% of my matches coming from 23andme were from a Pile-up on Chromosome 15: POS > 23976155 and POS < 25855576). I looked at the two kits, Ancestry checks for 460 RSIDs in this range of positions. 23andme checks for 602. The pileups for Ancestry are much less than 23andme. Ancestry only had 12 no-calls in this region, 23andme only had 17. I don't know why there were hundreds more matches in this range for 23andme compared to Ancestry. David -----Original Message----- From: ahnen@awest.de [mailto:ahnen@awest.de] Sent: Wednesday, January 6, 2016 11:09 AM To: David Schroeder; genealogy-dna@rootsweb.com Subject: Re: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA David, A summary on the last comparisons would be fine. I've got lost in the many tables which despite your wonderful effort are still hard to read in plain email format. Great analysis! Andreas
Hello List, I am J-PF7395+ and A457+ Please read the background information,in the link below,carefully before reading this message : http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2014-06/1402073036 So I am J-PF7395+ ,A457+ and CTS11388- A457 is currently a private SNP. Private SNP : http://www.snpedia.com/index.php/Private_SNP ( Please note,that individuals can test for A457 at Familytreedna.com also.) Individuals,that have tested positive for A457 in chronological order. Me,my son and my genetic cousin whose surname is Penso ( originally from the Veneto region of Italy) Also Penso is a 33/37 STR match with me ( another genetic cousin of mine whose surname is Penzo , hasn't tested for A457 yet, is a 60/67 STR match with me ) I am interested in finding out ,if other individuals ( who have done the Geno 2.0 test ) are positive for PF7395 and CTS11388. If so, these individuals should test for A457. Hopefully, these individuals whilst carrying out a search for more information regarding PF7395 will stumble upon this message ( they are welcome to contact me ) Best regards, Costa.Tsirigakis -----------------------------------------------------Mail.be, WebMail and Virtual Officehttp://www.mail.be
Tim, Many thanks for posting this. I have been waiting for some time to see what sort of results were coming from 2x coverage. I am a little bit encouraged by the figures from FGC. For the R-L617 project we have 20+ participants who have tested YElite or BigY. So we have a good tree starting to take shape of known SNPs below L617. If the 2x coverage test "covers" about 14,000,000 bases of Y, it might be too thinly covered to "confidently" call SNPs, as many of these bases will only be covered once. But in my case, I have a "work in progress" tree for mutations below L617, so reading the 2x coverage raw data might enable me to pick up mutations which are reported in raw data, but not reported more than once to enable FGC to call them confidently as a new SNP. My feeling is that in my case, where I have the advantage of a working SNP tree to compare to, it might be cost effective for me to extract benefit from the 2x coverage test. The area covered at least once by a 2x coverage test is more than half the area covered by YElite it seems. So if I can see matches to known L617 downstream and can call them, even at only 50% certainty, I should be able to tentatively place L617s into known branches. Several branches have 20+ known SNPS, so I only need one reliable call, or several less reliable calls of some of the 20+ branch SNPs to assign a tester to a branch, at least tentatively. Often there is already other forms of evidence pointing to a branch, so if it is corroborated to a degree by data from 2x coverage tests, it is a step forward. If in doubt, a SNP could be verified from single SNP testing. I think that I just need to have a few L617s tested on the 2x test to see how it works out in practice in my case. I feel optimistic. Although I am working on a SNP tree below L617, several haplogroup projects have similar type SNP trees building, so they might also get benefit. Cost of testing is always a challenge, so even if a 2x test is cutting corners a bit, it still might enable test to be done which otherwise might not be possible. John. Sent from my iPad > On 6/01/2016, at 9:32 am, Tim Janzen via <genealogy-dna@rootsweb.com> wrote: > > Dear Atanas, > Justin Loe from FullGenomes just posted the information below on > another list: > > "As mentioned earlier, these are our beta results for 1x coverage: > > Mapped Y coverage > 30x 22,856,938 > 10x 22,025,697 > 4x 17,678,170 > 2x 13,755,442 > > Average Callable Loci > 30x 14,558,001 > 10x 8,046,540 > 4x 1,050,996 > 2x 349,397 > > Y Elite 2.0: > 14,000,000 Callable Loci approximately on average" > > This helps provide clearer data for your last question than I was able to do > in my earlier response. It thus appears that one needs to order at least > the 30x whole genome sequence from FullGenomes to get the same number of > callable Y chromosome loci as you can get from the Y Elite 2.0 test. > Sincerely, > Tim Janzen > > > -----Original Message----- > From: genealogy-dna-bounces@rootsweb.com > [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Atanas Kumbarov via > Sent: Tuesday, January 5, 2016 8:59 AM > To: genealogy-dna@rootsweb.com > Subject: [DNA] Full Genomes Corporation tests > > I have several questions about their tests: > > 1. I wonder how useful 2x results can be? > 2. What can be done with a FGS? > 3. Is it possible to extract usable data for GedMatch from FGS *in an easy > way*? > 4. How well is the Y-chromosome covered? > > Best regards, > Atanas Kumbarov > > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
Very interesting. What was the original error rare vs the "fixed" error rate? Also, were the errors typically one location at a time or did you find runs of errors with several locations in a row? Roberta -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder via Sent: Wednesday, January 06, 2016 11:23 AM To: genealogy-dna@rootsweb.com Subject: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA This is a continuation of my observations comparing raw data files from 23andme and AncestryDNA. Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded the raw data files from each one. These two raw datas I will call 'Original'. Using SQL I loaded these files to a database. I also converted the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on both of them using matching RSIDs common to both when one vendor had a no-call, and the other vendor had made a call on the values. Several thousand no-calls were fixed this way reducing the error rate by about a third I extracted these database tables into new raw data files that I call 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 original and 2 fixed. I then used Gedmatch's utility, "Matching Segment Search" and extracted matches for all 4 individually using a Segment Length of 5 cM. I uploaded each of the matches into a database table so I could compare. All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA kits by these amounts: +-----------------+--------+--------+ | 23 Orig Matches | Vendor | Avg cM | +-----------------+--------+--------+ | 696 | A | 7.6 | | 329 | F | 7.3 | | 933 | M | 7.8 | +-----------------+--------+--------+ +---------------+----------------------+ | Total Matches | Original 23andme kit | +---------------+----------------------+ | 1958 | M080859 | +---------------+----------------------+ +------------------+--------+--------+ | 23 Fixed Matches | Vendor | Avg cM | +------------------+--------+--------+ | 779 | A | 7.6 | | 257 | F | 7.6 | | 715 | M | 8.1 | +------------------+--------+--------+ +---------------+-------------------+ | Total Matches | Fixed 23andme kit | +---------------+-------------------+ | 1751 | M306764 | +---------------+-------------------+ There are 207 fewer matches on fixed 23andme data kits ####################################################### +------------------+--------+--------+ | ANC Orig Matches | Vendor | Avg cM | +------------------+--------+--------+ | 946 | A | 7.3 | | 335 | F | 7.2 | | 316 | M | 8.8 | +------------------+--------+--------+ +---------------+------------------+ | Total Matches | Original ANC kit | +---------------+------------------+ | 1597 | A934219 | +---------------+------------------+ +-------------------+--------+--------+ | ANC Fixed Matches | Vendor | Avg cM | +-------------------+--------+--------+ | 799 | A | 7.6 | | 238 | F | 7.7 | | 223 | M | 9.8 | +-------------------+--------+--------+ +---------------+---------------+ | Total Matches | Fixed ANC kit | +---------------+---------------+ | 1260 | A146269 | +---------------+---------------+ There are 327 fewer matches on fixed ancestrydna kits. This was somewhat similar in magnitude in reductions for the 23andme fixed vs original kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs fixed ancestrydna My thinking about fewer matches on the fixed kits were due to false positives were being eliminated because many no-calls getting fixed, especially ones with opposite homozygous alleles. They would share basically the same kits- other than the ones that were eliminated in the fixed. But looking at each kit individually it is much more than that. I compared 23andme kits- fixed vs original and vice versa- for kits not present in one or the other. Amazing to me, it was not a matter of the fixed kit having matches, but there were entirely different matches in each kit: kits that are in the 23andme original but are not in ancestry original; +-------------------------+--------+--------+ | 23 Orig not in ANC Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 86 | A | 6.9 | | 85 | F | 6.8 | | 709 | M | 7.3 | +-------------------------+--------+--------+ Kits that are in ancestry original but are not in 23andme original; +---------------------------------+--------+--------+ | ANC original not in 23 original | Vendor | AVG cM | +---------------------------------+--------+--------+ | 337 | A | 6.6 | | 94 | F | 6.7 | | 105 | M | 7.0 | +---------------------------------+--------+--------+ kits that are in the ancestry original but are not in 23andme fixed; +--------------------------+--------+--------+ | ANC Orig not in 23 fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 229 | A | 6.6 | | 130 | F | 6.5 | | 139 | M | 6.9 | +--------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 original; +-------------------------+--------+--------+ | 23 Fixed Not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 original that have no matches in 23 fixed; +-------------------------+--------+--------+ | 23 Orig not in 23 fixed | Vendor | AVG cM | +-------------------------+--------+--------+ | 202 | A | 6.6 | | 139 | F | 6.5 | | 263 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 orig; +-------------------------+--------+--------+ | 23 fixed not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits that are in the Ancestry fixed but are not in Ancestry original; +---------------------------+--------+--------+ | ANC fixed not in ANC Orig | Vendor | AVG cM | +---------------------------+--------+--------+ | 48 | A | 7.8 | | 17 | F | 8.0 | | 11 | M | 8.7 | +---------------------------+--------+--------+ kits that are in the Ancestry original but are not in Ancestry fixed; +---------------------------+--------+--------+ | ANC Orig not in ANC Fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 200 | A | 6.6 | | 114 | F | 6.4 | | 106 | M | 6.6 | +---------------------------+--------+--------+ kits that are in the 23andme fixed but are not in ancestry fixed; +---------------------------+--------+--------+ | 23 fixed not in ANC fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 56 | A | 6.6 | | 35 | F | 7.0 | | 542 | M | 7.4 | +---------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme fixed; +---------------------------+--------+--------+ | ANC fixed not in 23 fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 71 | A | 6.9 | | 19 | F | 7.0 | | 58 | M | 7.5 | +---------------------------+--------+--------+ kits that are in the 23andme original but are not in ancestry fixed; +--------------------------+--------+--------+ | 23 Orig not in ANC fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 218 | A | 6.6 | | 160 | F | 6.5 | | 763 | M | 7.2 | +--------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme original; +--------------------------+--------+--------+ | ANC fixed not in 23 Orig | Vendor | AVG cM | +--------------------------+--------+--------+ | 315 | A | 6.7 | | 71 | F | 7.0 | | 59 | M | 7.4 | +--------------------------+--------+--------+ ################################################################### Maybe, too much information here, but I wanted to show each comparison. Disclosure: I have a pile-up region on chromosome 15 where I have hundreds of bogus matches. I did not factor that out. I invite anyone's comments about what may be going on. And any suggestions about further queries. David Schroeder ------------------------------- To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
Error Rates (No-Calls): Original Ancestry: 1.9444416327699 percent Fixed Ancestry: 0.99817658890539 percent Original 23andme: 2.1583132550279 percent Fixed 23andme: 1.482448214704 percent +------+--------------+ | CHR | ANC no-calls | +------+--------------+ | X | 16 | | 1 | 1093 | | 2 | 1009 | | 3 | 835 | | 4 | 711 | | 5 | 753 | | 6 | 866 | | 7 | 742 | | 8 | 641 | | 9 | 658 | | 10 | 723 | | 11 | 680 | | 12 | 648 | | 13 | 540 | | 14 | 439 | | 15 | 404 | | 16 | 436 | | 17 | 428 | | 18 | 390 | | 19 | 424 | | 20 | 351 | | 21 | 185 | | 22 | 248 | +------+--------------+ +------+------------------+ | CHR | 23andme no-calls | +------+------------------+ | X | 996 | | Y | 1034 | | MT | 60 | | 1 | 1557 | | 2 | 1549 | | 3 | 1269 | | 4 | 1116 | | 5 | 1139 | | 6 | 1486 | | 7 | 1105 | | 8 | 1044 | | 9 | 865 | | 10 | 1012 | | 11 | 1026 | | 12 | 1011 | | 13 | 778 | | 14 | 673 | | 15 | 583 | | 16 | 671 | | 17 | 630 | | 18 | 543 | | 19 | 585 | | 20 | 474 | | 21 | 257 | | 22 | 430 | +------+------------------+ I did not see any errors in a row. There were several thousand no-calls on the same RSIDs for both Ancestry and 23andme. Those are the ones that mainly contributed to the error rates after the fix. David Schroeder -----Original Message----- From: Roberta Estes [mailto:robertajestes@att.net] Sent: Wednesday, January 6, 2016 10:36 AM To: 'David Schroeder'; genealogy-dna@rootsweb.com Subject: RE: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA Very interesting. What was the original error rare vs the "fixed" error rate? Also, were the errors typically one location at a time or did you find runs of errors with several locations in a row? Roberta -----Original Message----- From: genealogy-dna-bounces@rootsweb.com [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of David Schroeder via Sent: Wednesday, January 06, 2016 11:23 AM To: genealogy-dna@rootsweb.com Subject: [DNA] Raw Data Files Comparisons with 23andme and AncestryDNA This is a continuation of my observations comparing raw data files from 23andme and AncestryDNA. Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded the raw data files from each one. These two raw datas I will call 'Original'. Using SQL I loaded these files to a database. I also converted the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on both of them using matching RSIDs common to both when one vendor had a no-call, and the other vendor had made a call on the values. Several thousand no-calls were fixed this way reducing the error rate by about a third I extracted these database tables into new raw data files that I call 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 original and 2 fixed. I then used Gedmatch's utility, "Matching Segment Search" and extracted matches for all 4 individually using a Segment Length of 5 cM. I uploaded each of the matches into a database table so I could compare. All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA kits by these amounts: +-----------------+--------+--------+ | 23 Orig Matches | Vendor | Avg cM | +-----------------+--------+--------+ | 696 | A | 7.6 | | 329 | F | 7.3 | | 933 | M | 7.8 | +-----------------+--------+--------+ +---------------+----------------------+ | Total Matches | Original 23andme kit | +---------------+----------------------+ | 1958 | M080859 | +---------------+----------------------+ +------------------+--------+--------+ | 23 Fixed Matches | Vendor | Avg cM | +------------------+--------+--------+ | 779 | A | 7.6 | | 257 | F | 7.6 | | 715 | M | 8.1 | +------------------+--------+--------+ +---------------+-------------------+ | Total Matches | Fixed 23andme kit | +---------------+-------------------+ | 1751 | M306764 | +---------------+-------------------+ There are 207 fewer matches on fixed 23andme data kits ####################################################### +------------------+--------+--------+ | ANC Orig Matches | Vendor | Avg cM | +------------------+--------+--------+ | 946 | A | 7.3 | | 335 | F | 7.2 | | 316 | M | 8.8 | +------------------+--------+--------+ +---------------+------------------+ | Total Matches | Original ANC kit | +---------------+------------------+ | 1597 | A934219 | +---------------+------------------+ +-------------------+--------+--------+ | ANC Fixed Matches | Vendor | Avg cM | +-------------------+--------+--------+ | 799 | A | 7.6 | | 238 | F | 7.7 | | 223 | M | 9.8 | +-------------------+--------+--------+ +---------------+---------------+ | Total Matches | Fixed ANC kit | +---------------+---------------+ | 1260 | A146269 | +---------------+---------------+ There are 327 fewer matches on fixed ancestrydna kits. This was somewhat similar in magnitude in reductions for the 23andme fixed vs original kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs fixed ancestrydna My thinking about fewer matches on the fixed kits were due to false positives were being eliminated because many no-calls getting fixed, especially ones with opposite homozygous alleles. They would share basically the same kits- other than the ones that were eliminated in the fixed. But looking at each kit individually it is much more than that. I compared 23andme kits- fixed vs original and vice versa- for kits not present in one or the other. Amazing to me, it was not a matter of the fixed kit having matches, but there were entirely different matches in each kit: kits that are in the 23andme original but are not in ancestry original; +-------------------------+--------+--------+ | 23 Orig not in ANC Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 86 | A | 6.9 | | 85 | F | 6.8 | | 709 | M | 7.3 | +-------------------------+--------+--------+ Kits that are in ancestry original but are not in 23andme original; +---------------------------------+--------+--------+ | ANC original not in 23 original | Vendor | AVG cM | +---------------------------------+--------+--------+ | 337 | A | 6.6 | | 94 | F | 6.7 | | 105 | M | 7.0 | +---------------------------------+--------+--------+ kits that are in the ancestry original but are not in 23andme fixed; +--------------------------+--------+--------+ | ANC Orig not in 23 fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 229 | A | 6.6 | | 130 | F | 6.5 | | 139 | M | 6.9 | +--------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 original; +-------------------------+--------+--------+ | 23 Fixed Not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 original that have no matches in 23 fixed; +-------------------------+--------+--------+ | 23 Orig not in 23 fixed | Vendor | AVG cM | +-------------------------+--------+--------+ | 202 | A | 6.6 | | 139 | F | 6.5 | | 263 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 orig; +-------------------------+--------+--------+ | 23 fixed not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits that are in the Ancestry fixed but are not in Ancestry original; +---------------------------+--------+--------+ | ANC fixed not in ANC Orig | Vendor | AVG cM | +---------------------------+--------+--------+ | 48 | A | 7.8 | | 17 | F | 8.0 | | 11 | M | 8.7 | +---------------------------+--------+--------+ kits that are in the Ancestry original but are not in Ancestry fixed; +---------------------------+--------+--------+ | ANC Orig not in ANC Fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 200 | A | 6.6 | | 114 | F | 6.4 | | 106 | M | 6.6 | +---------------------------+--------+--------+ kits that are in the 23andme fixed but are not in ancestry fixed; +---------------------------+--------+--------+ | 23 fixed not in ANC fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 56 | A | 6.6 | | 35 | F | 7.0 | | 542 | M | 7.4 | +---------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme fixed; +---------------------------+--------+--------+ | ANC fixed not in 23 fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 71 | A | 6.9 | | 19 | F | 7.0 | | 58 | M | 7.5 | +---------------------------+--------+--------+ kits that are in the 23andme original but are not in ancestry fixed; +--------------------------+--------+--------+ | 23 Orig not in ANC fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 218 | A | 6.6 | | 160 | F | 6.5 | | 763 | M | 7.2 | +--------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme original; +--------------------------+--------+--------+ | ANC fixed not in 23 Orig | Vendor | AVG cM | +--------------------------+--------+--------+ | 315 | A | 6.7 | | 71 | F | 7.0 | | 59 | M | 7.4 | +--------------------------+--------+--------+ ################################################################### Maybe, too much information here, but I wanted to show each comparison. Disclosure: I have a pile-up region on chromosome 15 where I have hundreds of bogus matches. I did not factor that out. I invite anyone's comments about what may be going on. And any suggestions about further queries. David Schroeder ------------------------------- To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
This is a continuation of my observations comparing raw data files from 23andme and AncestryDNA. Background: I tested at both 23andme (V3) and AncestryDNA and I downloaded the raw data files from each one. These two raw datas I will call 'Original'. Using SQL I loaded these files to a database. I also converted the AncestryDNA file to 23andme format. I used SQL I fixed the 'no-calls' on both of them using matching RSIDs common to both when one vendor had a no-call, and the other vendor had made a call on the values. Several thousand no-calls were fixed this way reducing the error rate by about a third I extracted these database tables into new raw data files that I call 'fixed'. I uploaded the 4 raw data files to Gedmatch as 4 different kits- 2 original and 2 fixed. I then used Gedmatch's utility, "Matching Segment Search" and extracted matches for all 4 individually using a Segment Length of 5 cM. I uploaded each of the matches into a database table so I could compare. All 4 kits had matches on (M) 23andme, (A) Ancestry, and (F) FamilyTreeDNA kits by these amounts: +-----------------+--------+--------+ | 23 Orig Matches | Vendor | Avg cM | +-----------------+--------+--------+ | 696 | A | 7.6 | | 329 | F | 7.3 | | 933 | M | 7.8 | +-----------------+--------+--------+ +---------------+----------------------+ | Total Matches | Original 23andme kit | +---------------+----------------------+ | 1958 | M080859 | +---------------+----------------------+ +------------------+--------+--------+ | 23 Fixed Matches | Vendor | Avg cM | +------------------+--------+--------+ | 779 | A | 7.6 | | 257 | F | 7.6 | | 715 | M | 8.1 | +------------------+--------+--------+ +---------------+-------------------+ | Total Matches | Fixed 23andme kit | +---------------+-------------------+ | 1751 | M306764 | +---------------+-------------------+ There are 207 fewer matches on fixed 23andme data kits ####################################################### +------------------+--------+--------+ | ANC Orig Matches | Vendor | Avg cM | +------------------+--------+--------+ | 946 | A | 7.3 | | 335 | F | 7.2 | | 316 | M | 8.8 | +------------------+--------+--------+ +---------------+------------------+ | Total Matches | Original ANC kit | +---------------+------------------+ | 1597 | A934219 | +---------------+------------------+ +-------------------+--------+--------+ | ANC Fixed Matches | Vendor | Avg cM | +-------------------+--------+--------+ | 799 | A | 7.6 | | 238 | F | 7.7 | | 223 | M | 9.8 | +-------------------+--------+--------+ +---------------+---------------+ | Total Matches | Fixed ANC kit | +---------------+---------------+ | 1260 | A146269 | +---------------+---------------+ There are 327 fewer matches on fixed ancestrydna kits. This was somewhat similar in magnitude in reductions for the 23andme fixed vs original kits.For the fixed kits, there were 491 fewer matches for fixed 23andme vs fixed ancestrydna My thinking about fewer matches on the fixed kits were due to false positives were being eliminated because many no-calls getting fixed, especially ones with opposite homozygous alleles. They would share basically the same kits- other than the ones that were eliminated in the fixed. But looking at each kit individually it is much more than that. I compared 23andme kits- fixed vs original and vice versa- for kits not present in one or the other. Amazing to me, it was not a matter of the fixed kit having matches, but there were entirely different matches in each kit: kits that are in the 23andme original but are not in ancestry original; +-------------------------+--------+--------+ | 23 Orig not in ANC Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 86 | A | 6.9 | | 85 | F | 6.8 | | 709 | M | 7.3 | +-------------------------+--------+--------+ Kits that are in ancestry original but are not in 23andme original; +---------------------------------+--------+--------+ | ANC original not in 23 original | Vendor | AVG cM | +---------------------------------+--------+--------+ | 337 | A | 6.6 | | 94 | F | 6.7 | | 105 | M | 7.0 | +---------------------------------+--------+--------+ kits that are in the ancestry original but are not in 23andme fixed; +--------------------------+--------+--------+ | ANC Orig not in 23 fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 229 | A | 6.6 | | 130 | F | 6.5 | | 139 | M | 6.9 | +--------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 original; +-------------------------+--------+--------+ | 23 Fixed Not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 original that have no matches in 23 fixed; +-------------------------+--------+--------+ | 23 Orig not in 23 fixed | Vendor | AVG cM | +-------------------------+--------+--------+ | 202 | A | 6.6 | | 139 | F | 6.5 | | 263 | M | 6.9 | +-------------------------+--------+--------+ kits in 23 fixed that have no matches in 23 orig; +-------------------------+--------+--------+ | 23 fixed not in 23 Orig | Vendor | AVG cM | +-------------------------+--------+--------+ | 281 | A | 6.7 | | 67 | F | 7.1 | | 48 | M | 6.9 | +-------------------------+--------+--------+ kits that are in the Ancestry fixed but are not in Ancestry original; +---------------------------+--------+--------+ | ANC fixed not in ANC Orig | Vendor | AVG cM | +---------------------------+--------+--------+ | 48 | A | 7.8 | | 17 | F | 8.0 | | 11 | M | 8.7 | +---------------------------+--------+--------+ kits that are in the Ancestry original but are not in Ancestry fixed; +---------------------------+--------+--------+ | ANC Orig not in ANC Fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 200 | A | 6.6 | | 114 | F | 6.4 | | 106 | M | 6.6 | +---------------------------+--------+--------+ kits that are in the 23andme fixed but are not in ancestry fixed; +---------------------------+--------+--------+ | 23 fixed not in ANC fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 56 | A | 6.6 | | 35 | F | 7.0 | | 542 | M | 7.4 | +---------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme fixed; +---------------------------+--------+--------+ | ANC fixed not in 23 fixed | Vendor | AVG cM | +---------------------------+--------+--------+ | 71 | A | 6.9 | | 19 | F | 7.0 | | 58 | M | 7.5 | +---------------------------+--------+--------+ kits that are in the 23andme original but are not in ancestry fixed; +--------------------------+--------+--------+ | 23 Orig not in ANC fixed | Vendor | AVG cM | +--------------------------+--------+--------+ | 218 | A | 6.6 | | 160 | F | 6.5 | | 763 | M | 7.2 | +--------------------------+--------+--------+ kits that are in the ancestry fixed but are not in 23andme original; +--------------------------+--------+--------+ | ANC fixed not in 23 Orig | Vendor | AVG cM | +--------------------------+--------+--------+ | 315 | A | 6.7 | | 71 | F | 7.0 | | 59 | M | 7.4 | +--------------------------+--------+--------+ ################################################################### Maybe, too much information here, but I wanted to show each comparison. Disclosure: I have a pile-up region on chromosome 15 where I have hundreds of bogus matches. I did not factor that out. I invite anyone's comments about what may be going on. And any suggestions about further queries. David Schroeder
I will be more than happy to examine your WGS 3x data together with your Big Y data. Best regards, Atanas Kumbarov On 06/01/16 01:28, steven perkins via wrote: > All: > > I have the Big Y, a Full Genomes WGS at 3X, and the imputed VCF from my V3 > 23andMe results at DNA.Land. Who can compare the Y DNA results in all three > files? I'd like to have it compared to another R1a with similar result > files. > > After seeing the numbers in the post by Justin quoted by Tim, I will be > increasing the coverage of the WGS file from Full Genomes. > > Steven > > > > On Tue, Jan 5, 2016 at 7:07 PM, Thomas Krahn via <genealogy-dna@rootsweb.com >> wrote: >> Atanas, >> >> The largest cost factor for Y chromosome tests is not the sequencing, >> but the enrichment procedure. >> With enrichment the price will never drop far below $400, so you might >> as well sequence with sufficient coverage. >> >> On the long run there is no future for BigY and FGCs Y tests since the >> WGS tests keep dropping in the price, but Y tests don't. >> >> Thomas >> >> >> On 01/06/2016 12:55 AM, Atanas Kumbarov via wrote: >>> I have exactly the same idea. If you have a tree to follow, you don't >>> need that many reads at a given position. But I think that it will be >>> better if FGC designs a Y-chromosome only test at 5x instead of WGS test >>> at 2x with the same price tag (around $250). The increase of depth of >>> coverage could be compensated by the decrease of sequenced locis (the >>> Y-chromosome only). >>> >>> 5x is sufficient - the 1000GP had 5x to 7x coverage and never the less >>> we see huge 1000GP dominated segments of the YFull tree like this one >>> http://yfull.com/tree/R-L657/ >>> >>> If such a test was offered, it could gain a lot of popularity. >>> >>> Best regards, >>> Atanas Kumbarov >>> >>> >>> >>> On 05/01/16 23:37, AJ Marsh via wrote: >>>> Tim, >>>> >>>> Many thanks for posting this. >>>> >>>> I have been waiting for some time to see what sort of results were >> coming from 2x coverage. I am a little bit encouraged by the figures from >> FGC. >>>> For the R-L617 project we have 20+ participants who have tested YElite >> or BigY. So we have a good tree starting to take shape of known SNPs below >> L617. >>>> If the 2x coverage test "covers" about 14,000,000 bases of Y, it might >> be too thinly covered to "confidently" call SNPs, as many of these bases >> will only be covered once. But in my case, I have a "work in progress" >> tree for mutations below L617, so reading the 2x coverage raw data might >> enable me to pick up mutations which are reported in raw data, but not >> reported more than once to enable FGC to call them confidently as a new SNP. >>>> My feeling is that in my case, where I have the advantage of a working >> SNP tree to compare to, it might be cost effective for me to extract >> benefit from the 2x coverage test. The area covered at least once by a 2x >> coverage test is more than half the area covered by YElite it seems. So if >> I can see matches to known L617 downstream and can call them, even at only >> 50% certainty, I should be able to tentatively place L617s into known >> branches. >>>> Several branches have 20+ known SNPS, so I only need one reliable call, >> or several less reliable calls of some of the 20+ branch SNPs to assign a >> tester to a branch, at least tentatively. Often there is already other >> forms of evidence pointing to a branch, so if it is corroborated to a >> degree by data from 2x coverage tests, it is a step forward. If in doubt, >> a SNP could be verified from single SNP testing. >>>> I think that I just need to have a few L617s tested on the 2x test to >> see how it works out in practice in my case. I feel optimistic. Although >> I am working on a SNP tree below L617, several haplogroup projects have >> similar type SNP trees building, so they might also get benefit. Cost of >> testing is always a challenge, so even if a 2x test is cutting corners a >> bit, it still might enable test to be done which otherwise might not be >> possible. >>>> John. >>>> >>>> Sent from my iPad >>>> >>>>> On 6/01/2016, at 9:32 am, Tim Janzen via <genealogy-dna@rootsweb.com> >> wrote: >>>>> Dear Atanas, >>>>> Justin Loe from FullGenomes just posted the information below on >>>>> another list: >>>>> >>>>> "As mentioned earlier, these are our beta results for 1x coverage: >>>>> >>>>> Mapped Y coverage >>>>> 30x 22,856,938 >>>>> 10x 22,025,697 >>>>> 4x 17,678,170 >>>>> 2x 13,755,442 >>>>> >>>>> Average Callable Loci >>>>> 30x 14,558,001 >>>>> 10x 8,046,540 >>>>> 4x 1,050,996 >>>>> 2x 349,397 >>>>> >>>>> Y Elite 2.0: >>>>> 14,000,000 Callable Loci approximately on average" >>>>> >>>>> This helps provide clearer data for your last question than I was able >> to do >>>>> in my earlier response. It thus appears that one needs to order at >> least >>>>> the 30x whole genome sequence from FullGenomes to get the same number >> of >>>>> callable Y chromosome loci as you can get from the Y Elite 2.0 test. >>>>> Sincerely, >>>>> Tim Janzen >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: genealogy-dna-bounces@rootsweb.com >>>>> [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Atanas >> Kumbarov via >>>>> Sent: Tuesday, January 5, 2016 8:59 AM >>>>> To: genealogy-dna@rootsweb.com >>>>> Subject: [DNA] Full Genomes Corporation tests >>>>> >>>>> I have several questions about their tests: >>>>> >>>>> 1. I wonder how useful 2x results can be? >>>>> 2. What can be done with a FGS? >>>>> 3. Is it possible to extract usable data for GedMatch from FGS *in an >> easy >>>>> way*? >>>>> 4. How well is the Y-chromosome covered? >>>>> >>>>> Best regards, >>>>> Atanas Kumbarov >>>>> >>>>> >>>>> ------------------------------- >>>>> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without >> the quotes in the subject and the body of the message >>>> ------------------------------- >>>> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without >> the quotes in the subject and the body of the message >>> ------------------------------- >>> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without >> the quotes in the subject and the body of the message >> >> >> ------------------------------- >> To unsubscribe from the list, please send an email to >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without >> the quotes in the subject and the body of the message >> > >
I have people and whole sub-clades of J-L24 who get very little out of BigY but a lot more out of FGC. I think this has to do with each persons set of random SNPs which may or may not align up well with the "windows into the Y" used by BigY. So, as a test on myself, I have ordered the FGC WGS 4x so I can compare with my FGC BGI Y test at 60x and also BigY (80x I think, not sure). The WGS 4x should be coming in soon ... Al On Wed, Jan 6, 2016 at 6:51 AM, Chris R. - Gen via < genealogy-dna@rootsweb.com> wrote: > I taught DNA.Land's imputation VCF of 23andMe data covers only recombing > genome parts (autosomal/X)? > Please let us know about stats and comparison of WGS 3X with BigY. Good > research. > > On Wed, Jan 6, 2016 at 8:41 AM, Atanas Kumbarov via < > genealogy-dna@rootsweb.com> wrote: > > > I will be more than happy to examine your WGS 3x data together with your > > Big Y data. > > > > Best regards, > > Atanas Kumbarov > > > > On 06/01/16 01:28, steven perkins via wrote: > > > All: > > > > > > I have the Big Y, a Full Genomes WGS at 3X, and the imputed VCF from my > > V3 > > > 23andMe results at DNA.Land. Who can compare the Y DNA results in all > > three > > > files? I'd like to have it compared to another R1a with similar result > > > files. > > > > > > After seeing the numbers in the post by Justin quoted by Tim, I will be > > > increasing the coverage of the WGS file from Full Genomes. > > > > > > Steven > > > > > > > > > > > > On Tue, Jan 5, 2016 at 7:07 PM, Thomas Krahn via < > > genealogy-dna@rootsweb.com > > >> wrote: > > >> Atanas, > > >> > > >> The largest cost factor for Y chromosome tests is not the sequencing, > > >> but the enrichment procedure. > > >> With enrichment the price will never drop far below $400, so you might > > >> as well sequence with sufficient coverage. > > >> > > >> On the long run there is no future for BigY and FGCs Y tests since the > > >> WGS tests keep dropping in the price, but Y tests don't. > > >> > > >> Thomas > > >> > > >> > > >> On 01/06/2016 12:55 AM, Atanas Kumbarov via wrote: > > >>> I have exactly the same idea. If you have a tree to follow, you don't > > >>> need that many reads at a given position. But I think that it will be > > >>> better if FGC designs a Y-chromosome only test at 5x instead of WGS > > test > > >>> at 2x with the same price tag (around $250). The increase of depth of > > >>> coverage could be compensated by the decrease of sequenced locis (the > > >>> Y-chromosome only). > > >>> > > >>> 5x is sufficient - the 1000GP had 5x to 7x coverage and never the > less > > >>> we see huge 1000GP dominated segments of the YFull tree like this one > > >>> http://yfull.com/tree/R-L657/ > > >>> > > >>> If such a test was offered, it could gain a lot of popularity. > > >>> > > >>> Best regards, > > >>> Atanas Kumbarov > > >>> > > >>> > > >>> > > >>> On 05/01/16 23:37, AJ Marsh via wrote: > > >>>> Tim, > > >>>> > > >>>> Many thanks for posting this. > > >>>> > > >>>> I have been waiting for some time to see what sort of results were > > >> coming from 2x coverage. I am a little bit encouraged by the figures > > from > > >> FGC. > > >>>> For the R-L617 project we have 20+ participants who have tested > YElite > > >> or BigY. So we have a good tree starting to take shape of known SNPs > > below > > >> L617. > > >>>> If the 2x coverage test "covers" about 14,000,000 bases of Y, it > might > > >> be too thinly covered to "confidently" call SNPs, as many of these > bases > > >> will only be covered once. But in my case, I have a "work in > progress" > > >> tree for mutations below L617, so reading the 2x coverage raw data > might > > >> enable me to pick up mutations which are reported in raw data, but not > > >> reported more than once to enable FGC to call them confidently as a > new > > SNP. > > >>>> My feeling is that in my case, where I have the advantage of a > working > > >> SNP tree to compare to, it might be cost effective for me to extract > > >> benefit from the 2x coverage test. The area covered at least once by > a > > 2x > > >> coverage test is more than half the area covered by YElite it seems. > > So if > > >> I can see matches to known L617 downstream and can call them, even at > > only > > >> 50% certainty, I should be able to tentatively place L617s into known > > >> branches. > > >>>> Several branches have 20+ known SNPS, so I only need one reliable > > call, > > >> or several less reliable calls of some of the 20+ branch SNPs to > assign > > a > > >> tester to a branch, at least tentatively. Often there is already > other > > >> forms of evidence pointing to a branch, so if it is corroborated to a > > >> degree by data from 2x coverage tests, it is a step forward. If in > > doubt, > > >> a SNP could be verified from single SNP testing. > > >>>> I think that I just need to have a few L617s tested on the 2x test > to > > >> see how it works out in practice in my case. I feel optimistic. > > Although > > >> I am working on a SNP tree below L617, several haplogroup projects > have > > >> similar type SNP trees building, so they might also get benefit. Cost > > of > > >> testing is always a challenge, so even if a 2x test is cutting > corners a > > >> bit, it still might enable test to be done which otherwise might not > be > > >> possible. > > >>>> John. > > >>>> > > >>>> Sent from my iPad > > >>>> > > >>>>> On 6/01/2016, at 9:32 am, Tim Janzen via < > genealogy-dna@rootsweb.com > > > > > >> wrote: > > >>>>> Dear Atanas, > > >>>>> Justin Loe from FullGenomes just posted the information below > > on > > >>>>> another list: > > >>>>> > > >>>>> "As mentioned earlier, these are our beta results for 1x coverage: > > >>>>> > > >>>>> Mapped Y coverage > > >>>>> 30x 22,856,938 > > >>>>> 10x 22,025,697 > > >>>>> 4x 17,678,170 > > >>>>> 2x 13,755,442 > > >>>>> > > >>>>> Average Callable Loci > > >>>>> 30x 14,558,001 > > >>>>> 10x 8,046,540 > > >>>>> 4x 1,050,996 > > >>>>> 2x 349,397 > > >>>>> > > >>>>> Y Elite 2.0: > > >>>>> 14,000,000 Callable Loci approximately on average" > > >>>>> > > >>>>> This helps provide clearer data for your last question than I was > > able > > >> to do > > >>>>> in my earlier response. It thus appears that one needs to order at > > >> least > > >>>>> the 30x whole genome sequence from FullGenomes to get the same > number > > >> of > > >>>>> callable Y chromosome loci as you can get from the Y Elite 2.0 > test. > > >>>>> Sincerely, > > >>>>> Tim Janzen > > >>>>> > > >>>>> > > >>>>> -----Original Message----- > > >>>>> From: genealogy-dna-bounces@rootsweb.com > > >>>>> [mailto:genealogy-dna-bounces@rootsweb.com] On Behalf Of Atanas > > >> Kumbarov via > > >>>>> Sent: Tuesday, January 5, 2016 8:59 AM > > >>>>> To: genealogy-dna@rootsweb.com > > >>>>> Subject: [DNA] Full Genomes Corporation tests > > >>>>> > > >>>>> I have several questions about their tests: > > >>>>> > > >>>>> 1. I wonder how useful 2x results can be? > > >>>>> 2. What can be done with a FGS? > > >>>>> 3. Is it possible to extract usable data for GedMatch from FGS *in > an > > >> easy > > >>>>> way*? > > >>>>> 4. How well is the Y-chromosome covered? > > >>>>> > > >>>>> Best regards, > > >>>>> Atanas Kumbarov > > >>>>> > > >>>>> > > >>>>> ------------------------------- > > >>>>> To unsubscribe from the list, please send an email to > > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' > without > > >> the quotes in the subject and the body of the message > > >>>> ------------------------------- > > >>>> To unsubscribe from the list, please send an email to > > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' > without > > >> the quotes in the subject and the body of the message > > >>> ------------------------------- > > >>> To unsubscribe from the list, please send an email to > > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' > without > > >> the quotes in the subject and the body of the message > > >> > > >> > > >> ------------------------------- > > >> To unsubscribe from the list, please send an email to > > >> GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' > without > > >> the quotes in the subject and the body of the message > > >> > > > > > > > > > > > > ------------------------------- > > To unsubscribe from the list, please send an email to > > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > > the quotes in the subject and the body of the message > > > > ------------------------------- > To unsubscribe from the list, please send an email to > GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without > the quotes in the subject and the body of the message >