Regarding the VCF file, you can write a script or a small program to convert the VCF the way you like. I have added support for different types of VCF files to my software for processing Y-chromosome data from VCF files. -- Best regards, Atanas Kumbarov http://dna.kumbarov.com/ On 2016-01-07 10:12, Iain Kennedy via wrote: > I am playing around with what can be done with this data, I was particularly interested to see how far I could get mocking up an AncestryDNA type file from the data. However I think 2x is too low to get far except as a proof of concept. I have loaded the dbSNP VCF into mySQL but it doesn't seem to follow the normal VCF standard which is a shame (it isn't using '.' in the ALT column when the calls were all ancestral). I also had a go using mpileup and varscan overnight but this requires 8x reads minimum: > > C:\genealogy\software>samtools mpileup -f human_g1k_v37.fasta GWK3W.bam | java -jar VarScan.v2.3.7.jar pileup2snp > gwk3w_vs.vcf > [mpileup] 1 samples in 1 input files > <mpileup> Set max per-file depth to 8000 > Warning: No p-value threshold provided, so p-values will not be calculated > Min coverage: 8 > Min reads2: 2 > Min var freq: 0.01 > Min avg qual: 15 > P-value thresh: 0.99 > Input stream not ready, waiting for 5 seconds... > Reading input from STDIN > 2349372654 bases in pileup file > 8008795 met minimum coverage of 8x > 2905740 SNPs predicted > > I haven't loaded it into the db yet. > > The dbSNP VCF I refer to includes ancestral and derived calls and mine has 107M rows. > > Iain > > > > > ------------------------------- > To unsubscribe from the list, please send an email to GENEALOGY-DNA-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message