Do you really need all of these 40 GB? When I work with large files, I usually discard the unnecessary information. For example, if a 1000 GP BAM file is 20 GB, I filter out all information except for the Y-chromosome and I end up at 200 MB. This way I can save and process hundreds of such files. -- Best regards, Atanas Kumbarov http://dna.kumbarov.com/ On 2016-01-07 15:08, Iain Kennedy via wrote: > This is true, although the one I am working with is 40Gb and being stuffed into a sql database - it pays to get it right first time, and know every optimization trick for your db bulk upload. I found this article useful > > http://derwiki.tumblr.com/post/24490758395/loading-half-a-billion-rows-into-mysql > > except > 'READ-REPEATABLE' should read 'REPEATABLE-READ' and innodb_flush_method can't be changed under Windows.Iain > > Atanas wrote: > Regarding the VCF file, you can write a script or a small program to > > convert the VCF the way you like. I have added support for different > > types of VCF files to my software for processing Y-chromosome data from > > VCF files. > > >