RootsWeb.com Mailing Lists
Total: 2/2
    1. [DNA] GR38 and GR37 primary reference assemblies vs alternate assemblies at GenBank
    2. Obed W Odom via
    3. Does anyone know the history and origin of the so-called alternate sequences at NCBI's GenBank? Over a 2551-nt region on chromosome 1 from location 14104950 thru 14107500, encompassing most of the coding region of gene PRDM2, I see that the GR37 and GR38 sequences differ from the "Homo sapiens chromosome 1, alternate assembly CHM1_1.1" by 9 point mutations and 2 triplet insertions (the latter being in the alternate assembly). >From my whole-genome sequence reads, it appears that 1 of my two chromosome 1's matches the alternate sequence exactly over the above region, while my other chromosome 1 matches the GR37 or GR38 sequence except for 2 point mutations and 1 triplet insertion. I have heard that the GR37 and GR 38 sequences are some sort of composite sequence derived from 13 individuals from Buffalo, NY, but I have no idea what the origin of the alternate sequence is. In any event, it seems that in the region mentioned above one of my chromosomes (not sure whether it is maternal or paternal) is identical to the alternate sequence and the other is much more similar to the GR37 or GR38 sequence. Would appreciate any feedback, Obed

    11/01/2015 03:36:44
    1. Re: [DNA] GR38 and GR37 primary reference assemblies vs alternate assemblies at GenBank
    2. Obed W Odom via
    3. I didn't find a specific answer to my question about alternate sequences, but info at NCBI did indicate that alternate sequences are sometimes given if there is something particularly interesting about them, such as several SNPs being completely linked to each other over a considerable distance. That appears to be the case in the example I gave for the PRDM2 gene on chromosome 1. Over a span of 4066 bases, beginning at GRCH37 location 14105049 and ending at location 14109114, there are 6 SNPS with an allele frequency of 0.0319489, indicating that they always occur together and are thus completely linked. In addition there are 3 more SNPs with just a little higher frequency and 2 more with a considerably higher frequency. These higher frequencies probably reflect that these SNPs occur to a variable extent on the chromosome without the linkage in addition to the one with the linkage. The alternate Chr 1 assembly includes all of these SNPs and my DNA also apparently has all of them on 1 of my Chr 1 copies. Interestingly, of these 11 linked SNPs, only 4 of them cause an amino acid change and the other 7 are synonymous. On Sun, Nov 1, 2015 at 10:36 PM, Obed W Odom <owodom@utexas.edu> wrote: > Does anyone know the history and origin of the so-called alternate > sequences at NCBI's GenBank? Over a 2551-nt region on chromosome 1 from > location 14104950 thru 14107500, encompassing most of the coding region of > gene PRDM2, I see that the GR37 and GR38 sequences differ from the "Homo > sapiens chromosome 1, alternate assembly CHM1_1.1" by 9 point mutations and > 2 triplet insertions (the latter being in the alternate assembly). > > From my whole-genome sequence reads, it appears that 1 of my two > chromosome 1's matches the alternate sequence exactly over the above > region, while my other chromosome 1 matches the GR37 or GR38 sequence > except for 2 point mutations and 1 triplet insertion. I have heard that > the GR37 and GR 38 sequences are some sort of composite sequence derived > from 13 individuals from Buffalo, NY, but I have no idea what the origin of > the alternate sequence is. In any event, it seems that in the region > mentioned above one of my chromosomes (not sure whether it is maternal or > paternal) is identical to the alternate sequence and the other is much more > similar to the GR37 or GR38 sequence. > > Would appreciate any feedback, > > Obed > > >

    11/02/2015 04:04:19