I mentioned this segment before, but the more I study it the less I understand it. In looking at my whole-genome sequence in the region of the PRDM2 gene at locations 14026714 thru 14151574 of chromosome 1, I decided to BLAST the Build GRCh37 sequence of PRDM2 against the human genome to see if there were other similar sequences which could have caused mapping to the wrong sequence. The only similar sequence I found was an "alternate" sequence covering the same region, but with several mismatches to the GRCh37 sequence. In looking at these mismatches more closely, I found that one of my chomosome1 copies matched every one of the mismatches between the GRCh37 and "alternate" sequences. In addition, the 1000 Genomes database shows an SNP at each of these mismatch positions. Many of these SNPs, according to 1000 Genomes, have identical frequencies, indicating that they are always found together (i.e., on the same chromosome). Most of these frequencies are in the 0.03 to 0.04 area with a few as low as 0.012. In other words, they are in the "uncommon" but not "extremely rare" category. I have since extended the region of comparison and found that it extends from GRCh37 chr1 location 14000722 to location 14129498, beginning somewhat upstream of the PRDM2 gene and ending before the last small coding exon of PRDM2. This span includes 366 mismatches between the GRCh37 (or GRCh38) reference sequence and the "alternate" chr1 sequence (locations 13799083 to 13927951 of CHM1_1.1; ID: ref|NC_018912.2 <https://www.ncbi.nlm.nih.gov/nucleotide/528476670?report=genbank&log$=nuclalign&blast_rank=1&RID=3ZJ88E01113> My question is how did all of these SNPs get together and stay together on one piece of DNA. Maybe the more important question is how they got together since the rate of crossover within a 128-kb segment would be only about 0.13 cM, or 0.0013 per generation. By the way this piece of DNA seems to be pretty old from 1000 Genome frequency data for various populations. The highest frequency by far, about 0.14, is in the South Asian group, with European and East Asian frequencies being about 0.02, and the African and American Indian about 0.003 and 0-0.0013, respectively. The low African frequency is reminiscent of the Neanderthal introgression of about 50,000 years ago, as I recall. Could this be something like that or indeed one of the pieces of Neanderthal DNA itself? Hope to hear from someone more knowledgeable than myself about this ( Ann, Greg, Thomas, others?)
I have confirmed at the Neandertal Browser < http://neandertal.ensemblgenomes.org/Homo_sapiens/Location/View?r=1:13942870-13942900> , which uses Build hg18, that my G to A SNP (rs114034230) at GRCh37 (hg19) location 14070302 does indeed agree with the change from G to A in going from the human to the Neanderthal sequence at the corresponding hg18 location (13942889). This SNP is one of the 366 mismatches between the GRCh37 (hg19) and the Chr1 "alternate" sequence in the region I gave earlier. I will compare some of my other "mismatch" SNPs to the Neanderthal sequence, but I now believe that I do indeed have a Neanderthal sequence on 1 copy of chr1 from GRCh37 location 14000722 thru location 14129498! This Neanderthal segment has probably been published somewhere, but it is news to me. Obed On Wed, Nov 11, 2015 at 2:07 AM, Obed W Odom <owodom@utexas.edu> wrote: > I mentioned this segment before, but the more I study it the less I > understand it. In looking at my whole-genome sequence in the region of the > PRDM2 gene at locations 14026714 thru 14151574 of chromosome 1, I decided > to BLAST the Build GRCh37 sequence of PRDM2 against the human genome to see > if there were other similar sequences which could have caused mapping to > the wrong sequence. The only similar sequence I found was an "alternate" > sequence covering the same region, but with several mismatches to the > GRCh37 sequence. In looking at these mismatches more closely, I found that > one of my chomosome1 copies matched every one of the mismatches between the > GRCh37 and "alternate" sequences. In addition, the 1000 Genomes database > shows an SNP at each of these mismatch positions. Many of these SNPs, > according to 1000 Genomes, have identical frequencies, indicating that they > are always found together (i.e., on the same chromosome). Most of these > frequencies are in the 0.03 to 0.04 area with a few as low as 0.012. In > other words, they are in the "uncommon" but not "extremely rare" category. > I have since extended the region of comparison and found that it extends > from GRCh37 chr1 location 14000722 to location 14129498, beginning somewhat > upstream of the PRDM2 gene and ending before the last small coding exon of > PRDM2. This span includes 366 mismatches between the GRCh37 (or GRCh38) > reference sequence and the "alternate" chr1 sequence (locations 13799083 to > 13927951 of CHM1_1.1; ID: ref|NC_018912.2 > <https://www.ncbi.nlm.nih.gov/nucleotide/528476670?report=genbank&log$=nuclalign&blast_rank=1&RID=3ZJ88E01113> > > My question is how did all of these SNPs get together and stay together on > one piece of DNA. Maybe the more important question is how they got > together since the rate of crossover within a 128-kb segment would be only > about 0.13 cM, or 0.0013 per generation. By the way this piece of DNA > seems to be pretty old from 1000 Genome frequency data for various > populations. The highest frequency by far, about 0.14, is in the South > Asian group, with European and East Asian frequencies being about 0.02, and > the African and American Indian about 0.003 and 0-0.0013, respectively. The > low African frequency is reminiscent of the Neanderthal introgression of > about 50,000 years ago, as I recall. Could this be something like that or > indeed one of the pieces of Neanderthal DNA itself? > > Hope to hear from someone more knowledgeable than myself about this ( > Ann, Greg, Thomas, others?) >