The list of I phyloequivalents could be very useful. Thank you for suppling it. Could you explain the no asterix, one asterix. two ...... rating system. Everything I have seen is technical goobly gook talk to me. Can it be reexplained in plain English. "Number of reads" by itself means nothing by itself to me. Is the same piece of lab guuu being read over and over? That would be sort of a waste of time; wouldn't you get the same result if reading the same stuff over and over? If number of reads is to mean anything important it must be independent reads on different stuff which under some model or theory should be yielding the same output. "Independence" of the operations is key. So please; how does the asterix rating system really operate, and how do they lead to the percentages, whatever they mean. Kenneth Nordtvedt Haplogroup I Clade Modalities and Trees at: http://knordtvedt.home.bresnan.net -----Original Message----- From: G. Magoon Sent: Wednesday, January 08, 2014 3:08 PM To: y-dna-haplogroup-i Subject: Re: [yDNAhgI] Update on Ancient I in Europe Very interesting Ken. Hopefully the raw data for these ancient samples will eventually become available, possibly once the paper is published. If people are looking for a list of phyloequivalent SNPs to the upstream branches like I, IJ, Q, etc., then the supporting info (Supporting Info B) from the tree generated for our recent manuscript ( http://biorxiv.org/content/early/2013/12/13/000802) should be useful. For example, here are all the SNPs on the I branch with P38: (The more unstable sites have higher "# of mutations in tree".) > For the branch from node 37 to node 4 with length 7090.00: > 2688442 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['CTS48', 'PF3569'] > 2707072 REF->ALT (C->T); # of mutations in tree = 2 (weighted = 70); > ['CTS70', 'PF3570'] > 2723755 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS88', 'PF3571'] > 2884029 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['L844', 'PF3572', 'YSC0000275'] > 2974782 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['PF3574'] > 3003354 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35); > ['PF3575'] > 3315632 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35) > 3366638 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3578'] > 3545070 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['P212', 'PF3580'] > 3831248 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3585'] > 3851589 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3586'] > 3928388 REF->ALT (G->T); # of mutations in tree = 3 (weighted = 15) > 3932370 REF->ALT (TAA->TA); # of mutations in tree = 1 (weighted = 35) > 3938321 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35) > 4064632 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['PF3588'] > 4077210 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['PF3589'] > 4116203 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3590'] > 4180074 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35) > 4184066 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3592'] > 4245332 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['PF3594'] > 4748471 REF->ALT (A->G); # of mutations in tree = 2 (weighted = 70) > 4859526 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3599'] > 4974832 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35) > 4989857 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['PF3600'] > 5129448 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 0); > ['PF3601'] > 5129449 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 0) > 5196541 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['PF3602'] > 5197625 REF->ALT (G->C); # of mutations in tree = 1 (weighted = 35); > ['PF3603'] > 5206105 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3604'] > 5217196 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3605'] > 5514820 REF->ALT (A->C); # of mutations in tree = 2 (weighted = 70) > 5528525 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35) > 5586317 REF->ALT (G->C); # of mutations in tree = 1 (weighted = 35); > ['PF3611'] > 5643555 REF->ALT (T->G); # of mutations in tree = 2 (weighted = 70); > ['PF3612'] > 5729506 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35) > 5744201 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35) > 5772273 REF->ALT (T->C); # of mutations in tree = 3 (weighted = 15) > 5857698 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3616'] > 5925267 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3617'] > 6067284 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3618'] > 6422345 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3620'] > 6440847 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['PF3622'] > 6477593 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3625'] > 6489980 REF->ALT (T->A); # of mutations in tree = 2 (weighted = 70) > 6497838 REF->ALT (GAA->GAAA); # of mutations in tree = 2 (weighted = 70) > 6575427 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35) > 6662712 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3627', 'V218'] > 6926038 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['CTS646', 'PF3629'] > 6943522 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS674', 'PF3630'] > 7004291 REF->ALT (G->C); # of mutations in tree = 1 (weighted = 35); > ['CTS772', 'PF3631'] > 7137088 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS1006'] > 7203486 REF->ALT (CTGT->CT); # of mutations in tree = 1 (weighted = 35) > 7321418 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS1301', 'PF3635'] > 7438521 REF->ALT (G->C); # of mutations in tree = 3 (weighted = 0); > ['CTS1491'] > 7438523 REF->ALT (G->C); # of mutations in tree = 3 (weighted = 0); > ['CTS1492'] > 7570370 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3639'] > 7642823 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35) > 7681156 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['PF3640'] > 7688470 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['PF3641'] > 7712917 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35); > ['PF3642'] > 7853028 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['PF3645'] > 7856500 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['L846', 'PF3646', 'YSC0000280'] > 7898045 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3864'] > 8046731 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['PF3649'] > 8262092 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35) > 8267857 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['L578', 'PF3653'] > 8278628 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['PF3654'] > 8382265 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['YSC0000281'] > 8465165 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['L755', 'PF3659', 'YSC0000283'] > 8466652 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3660'] > 8484606 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['PF3661'] > 8485677 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['L756', 'PF3662', 'YSC0000284'] > 8536868 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['L758', 'PF3663', 'YSC0000285'] > 8643763 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3665'] > 8728974 REF->ALT (T->G); # of mutations in tree = 1 (weighted = 35); > ['PF3666'] > 8873160 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35); > ['PF3668'] > 8984184 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3670'] > 9398607 REF->ALT (G->T); # of mutations in tree = 11 (weighted = 0); > ['M7449', 'PF3673'] > 9420891 REF->ALT (A->ACC); # of mutations in tree = 1 (weighted = 35) > 9516653 REF->ALT (T->G); # of mutations in tree = 1 (weighted = 35); > ['PF3675'] > 9827411 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['YSC0000256'] > 9832636 REF->ALT (A->G); # of mutations in tree = 5 (weighted = 0); > ['PF3676'] > 9891668 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3677'] > 9900057 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35) > 10051801 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35) > 13430863 REF->ALT (G->A); # of mutations in tree = 4 (weighted = 0) > 13442439 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35) > 13492708 REF->ALT (GTT->GTTT); # of mutations in tree = 3 (weighted = 15) > 13544835 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35); > ['PF3685'] > 13610767 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3687'] > 13642029 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['PF3689'] > 13804066 REF->ALT (G->C); # of mutations in tree = 1 (weighted = 35) > 13835003 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35) > 13900590 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['PF3694'] > 13914715 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35); > ['PF3695'] > 13961890 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS1555', 'PF3696'] > 14073053 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['CTS1800', 'PF3699'] > 14214481 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35); > ['CTS2193', 'PF3703'] > 14286853 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS2387', 'PF3705'] > 14337364 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS2514', 'PF3706'] > 14352669 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS2536', 'PF3707'] > 14484379 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['P38', 'PF3708'] > 14646409 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS3076', 'PF3712'] > 14847792 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['M170', 'PF3715'] > 14884646 REF->ALT (C->T); # of mutations in tree = 2 (weighted = 0); > ['CTS3383', 'PF3716'] > 14884659 REF->ALT (A->C); # of mutations in tree = 2 (weighted = 0); > ['CTS3384', 'PF3717'] > 14974451 REF->ALT (C->T); # of mutations in tree = 2 (weighted = 70); > ['L1197', 'PF3718', 'YSC0000260'] > 14986989 REF->ALT (T->G); # of mutations in tree = 1 (weighted = 35); > ['CTS3517', 'PF3719'] > 15023364 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['M258', 'PF3721'] > 15089989 REF->ALT (T->C); # of mutations in tree = 2 (weighted = 70); > ['CTS3641', 'PF3722'] > 15377802 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS4077', 'PF3725'] > 15389836 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS4088', 'PF3868'] > 15479899 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['CTS4209', 'PF3726'] > 15506055 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS4239', 'PF3727'] > 15536759 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS4272', 'PF3728'] > 15536870 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS4273', 'PF3729'] > 15595624 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS4340', 'PF3730'] > 15615533 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['L772', 'PF3731', 'YSC0000263'] > 15742130 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['CTS4637'] > 15759200 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS4664'] > 15793946 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS4745', 'PF3734'] > 15859012 REF->ALT (TCCC->TCC); # of mutations in tree = 1 (weighted = 35) > 15862842 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS4848', 'PF3736'] > 15937959 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS4982', 'PF3737'] > 15960476 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['CTS5016', 'PF3738'] > 16039881 REF->ALT (C->T); # of mutations in tree = 2 (weighted = 70); > ['CTS5150', 'PF3739'] > 16131227 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70) > 16171560 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS5263'] > 16354708 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3742', 'U179'] > 16397716 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['CTS5622', 'PF3743'] > 16415916 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS5650', 'PF3744'] > 16471254 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS5764', 'PF3746'] > 16548548 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['CTS5908', 'PF3747'] > 16567253 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS5946', 'PF3748'] > 16575110 REF->ALT (AT->A); # of mutations in tree = 1 (weighted = 35) > 16751000 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS6231', 'PF3750'] > 16780748 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['CTS6265', 'PF3871'] > 16785944 REF->ALT (T->C); # of mutations in tree = 2 (weighted = 70); > ['CTS6271', 'PF3751'] > 16826642 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS6334', 'PF3752'] > 16836079 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['CTS6343', 'PF3753'] > 16836548 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS6344', 'PF3754'] > 16939794 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35); > ['CTS6497', 'PF3756'] > 17090238 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['CTS6751', 'PF3757'] > 17245841 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS7026', 'PF3758'] > 17424807 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS7329'] > 17467526 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['PF3759', 'YSC0000267'] > 17497181 REF->ALT (C->A); # of mutations in tree = 2 (weighted = 70); > ['CTS7469'] > 17511797 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS7502', 'PF3760'] > 17525137 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS7540', 'PF3761'] > 17548890 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS7593', 'PF3763'] > 17622756 REF->ALT (A->C); # of mutations in tree = 9 (weighted = 0); > SITEOF(['M9222']) > 17692855 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['CTS7831', 'PF3766'] > 17818847 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8064', 'PF3768'] > 17901509 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8257'] > 17924382 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8300', 'PF3770'] > 17940414 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8333', 'PF3771'] > 17949402 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['CTS8345', 'PF3772'] > 18018313 REF->ALT (C->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8420', 'PF3773'] > 18078759 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8545', 'PF3775'] > 18172947 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS8742', 'PF3776'] > 18257568 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS8876', 'PF3778'] > 18394743 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['L751', 'PF3779', 'YSC0000291'] > 18404486 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3780'] > 18582617 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS8963'] > 18786174 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS9264', 'PF3782'] > 18789763 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS9269', 'PF3783'] > 18927031 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS9484', 'PF3785'] > 18992894 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS9618', 'PF3786'] > 19048602 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['L41', 'PF3787'] > 19097563 REF->ALT (T->C); # of mutations in tree = 1 (weighted = 35); > ['CTS9838', 'PF3788'] > 19104986 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS9860', 'PF3790'] > 19233673 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS10058'] > 19435305 REF->ALT (A->AT); # of mutations in tree = 1 (weighted = 35) > 21067903 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3794'] > 21077471 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3795'] > 21119888 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35); > ['PF3796'] > 21130059 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3797'] > 21155653 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35) > 21199929 REF->ALT (A->G); # of mutations in tree = 2 (weighted = 70); > ['PF3799'] > 21359407 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['L503'] > 21402723 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3800'] > 21452125 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3803'] > 21465033 REF->ALT (C->A); # of mutations in tree = 3 (weighted = 15); > ['PF3804'] > 21515724 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35) > 21525069 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3806'] > 21535086 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['PF3807'] > 21556106 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3809'] > 21627180 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3811'] > 21689728 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35) > 21794672 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3813'] > 21839183 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3814'] > 21841289 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35); > ['PF3815'] > 21939618 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3817'] > 22100087 REF->ALT (T->C); # of mutations in tree = 2 (weighted = 70); > ['PF3819'] > 22115103 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['YSC0000272'] > 22200336 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['PF3822'] > 22243817 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35) > 22444389 REF->ALT (T->A); # of mutations in tree = 1 (weighted = 35); > ['PF3827'] > 22458430 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['PF3828'] > 22458740 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['PF3829'] > 22459264 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35) > 22479907 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35) > 22485425 REF->ALT (A->T); # of mutations in tree = 1 (weighted = 35); > ['PF3833'] > 22525421 REF->ALT (T->G); # of mutations in tree = 1 (weighted = 35); > ['PF3836'] > 22573702 REF->ALT (G->A); # of mutations in tree = 2 (weighted = 70); > ['PF3837'] > 22845794 REF->ALT (A->G); # of mutations in tree = 1 (weighted = 35); > ['CTS10941', 'PF3838'] > 23084562 REF->ALT (G->T); # of mutations in tree = 1 (weighted = 35); > ['CTS11369', 'PF3840'] > 23113271 REF->ALT (C->G); # of mutations in tree = 1 (weighted = 35); > ['CTS11441'] > 23154034 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['L847', 'PF3841', 'YSC0000298'] > 23156725 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS11540', 'PF3842'] > 23267211 REF->ALT (G->A); # of mutations in tree = 1 (weighted = 35); > ['CTS11779', 'PF3844'] > 23401471 REF->ALT (C->T); # of mutations in tree = 1 (weighted = 35); > ['CTS11979', 'PF3878'] > 23479970 REF->ALT (A->C); # of mutations in tree = 1 (weighted = 35); > ['PF3847', 'YSC0000300'] > 26334890 REF->ALT (A->C); # of mutations in tree = 2 (weighted = 70) > 13246591 ALT->REF (G->A); # of mutations in tree = 1 (weighted = 35) On Wed, Jan 8, 2014 at 4:12 PM, Kenneth Nordtvedt <[email protected]>wrote: > Here’s an update on what has recently been learned about haplogroup I ydna > in ancient Europe. Bones of several males from 8000 b.p. were examined > for > their dna: Loschbour was from a Luxuemberg site, four male dna Motala > 2,3,9,12 > were from a Swedish site. Another male from Swedish site, Motala6, seems > not to be confirmed haplogroup I, but his true haplogroup is still > unidentified though thought Q for awhile. Their ydna analysis of these > ancient males is discussed in depth in the paper “Ancient human genomes > suggest three ancestral populations for presentday Europeans”. > > As of a month ago, the hobby community had collected a number of > phyloequivalent ysnps to M423, mainly from Geno2: the full list is M423, > L178, L1224, CTS8486, CTS8239, CTS7218, CTS5985, CTS5375, CTS1802, CTS176, > CTS1293, CTS11030. > Multiple customers of Geno2 chip or WTY and from the L161 “Isles” clades > or from the L621 “Dinaric” clade were found derived for ALL of these 12 > ysnps. Everyone else in haplogroup I was found ancestral for All of them. > > In the cited paper, however, only L178 and M423 were tested on these > ancient dna samples. They took their snps to test from ISOGG list, and as > we know that list always lags present knowledge because of the procedure > of > inclusion into their list. The ancient dna samples are not necessarily > readable on any given snp site, so it is important to have that ancient > dna > tested on alternative sites of equivalence if unreadable on any particular > site. So I sent the 10 snp sites listed above for which they did not test > to authors of the cited paper. They very kindly looked up the allele > values for the additional snps which could be read. > > Loschbour was readable for all 12 snps with results: derived for CTS8239, > CTS7218, CTS5985, CTS176, CTS1293, L178, M423, but ancestral for the other > five snps including L1224. > > Motala12 was readable for 9 of the 12 snps with results which exactly > matched Loschbour. Unreadable snps were CTS8239, L1224, M423. > > Motala3 was readable for only 3 of the 12 snps with results matching > Loschbour: CTS7218+, CTS176+, CTS1293+ > > Motala2 was readable for only 1 of the 12 snps and disagreed with first > three dna samples, being CTS1293- > > Motala9 was not readable for any of the 12 snps. > > Loschbour and Motala12 (and probably Motala3) establish a new branch line > of the I Tree which is today probably extinct or severely tiny in present > population. It splits the 12 snps into ancestrals and deriveds. See > “Tree and Map for haplogroup I” for position of this new branch of the > tree. > > So we concentrate on what we might further learn about Motala2 and Motala9 > samples using the rich catalog of well placed ysnps that has resulted from > Geno2 and other products done by hobbyists. > > All we know about Motala9 is that he is P38+ but P40-. All we know about > Motala2 is that he is P38+ M253+ Z79- L703- L37- L621-. So both are > haplogroup I, but that’s about all we know. > Motala9 could be most anything within haplogroup I except a modern I1. He > could be anything in I2...... and could be a new ancient branch on the > ancestral I1 line which branched off prior to P40. That’s a time range of > 22,000 years b.p. up to about 4500 years b.p. Similarly Motala2 could > have > branched off of the ancestral I1 line anytime before M253, and Motala > still > has much of I2...... for his location. > > So there is much work that authors of this paper can do using our Geno2 > generated rich lists of phylogenetically equivalent snps for many of the > key branch segments of the I tree. Motala2 and Motala9 could conceivably > be placed fairly well in the haplogroup I tree, giving us much better > incite into presence of our haplogroup in northern Europe 8000 years ago. > I have sent several sets of the confirmed phylogenetically equivalent snp > lists to them and hopefully they will see if they are readable for Motala2 > and Motala9. > > What I have not put together yet, waiting to see if they will use it or > that Motala2 and Motala9 are not confirmed to be part of I2....., is the > very huge list of firmly confirmed phyloequivalent snps to M253 and P40 > (I1 > snps). Since that branch line segment is so long, 22000 years ago up to > 4500 years ago, the list of well confirmed equivalent snps is probably > presently approaching 100 in number. If Motala2 or Motala9 are confirmed > I2..... then no sense checking them on all those I1 snps. > > So there could be much more of value to be extracted from these ancient > dna samples from a complete use of their full genome measurements plus the > hobby community’s most up to date list of well confirmed and well placed > ysnps. > I hope they will continue to work with us as they did on the initial set > of ysnps equivalent to M423 and L178 which produced such informative > results. > > Footnote: If Motala6 is indeed not confirmed haplogroup Q as the cited > paper concluded, then a complete list of phylogenetically equivalent snps > to P38 should probably be tested to see if he is haplogroup I or not? > > > > > > > > > > Kenneth Nordtvedt > > Haplogroup I Clade Modalities and Trees at: > http://knordtvedt.home.bresnan.net > > > ------------------------------- > To unsubscribe from the list, please send an email to > [email protected] with the word 'unsubscribe' > without the quotes in the subject and the body of the message ------------------------------- To unsubscribe from the list, please send an email to [email protected] with the word 'unsubscribe' without the quotes in the subject and the body of the message
Ken, I gather you are talking about the Full Genomes results that are presented in the "variantCompare" and "haplogroupCompare" reports. First some background: There are essentially two different types of bioinformatic analysis that are traditionally used to interpret next-gen sequencing results. The first, which might be called "genotyping", involves looking at sites of known variation (e.g. M253) and figuring out what is the allele at that site; this is what is done by the gtype report. The second goes by the name of "variant discovery", "variant identification", or "variant calling" ("variant" may be also replaced by "SNP" to refer to that particular type of variant which is most commonly discussed); here, the idea is to identify differences from the reference human genome without any *a priori* knowledge about what these variations are or where they will be found (though often the variants that are found may be identified as corresponding to known variants after the fact); this latter process is the basis for the "variantCompare" and "haplogroupCompare" reports. With that as background, I'll try to address your question about the "Reliability flag", which is indicated by varying number of asterisks (*), with higher number of * corresponding to less reliable variants that are more likely to be false-positives. The goal here is to avoid the identification of significant numbers of spurious, false-positive variants, as next-gen sequencing methods are prone to doing. Certain regions of chrY are more prone to these issues than others. A number of factors are taken into consideration when "binning" the variants by reliability. The reliability indicator provides a rough indication of the likelihood of a genuine variant for purposes of novel variant identification (rather than probability of a particular genotype). So, certain well-known variants may be classified as ** or *** when in fact underlying results are solid (and further details can be seen on the gtype report). The reliability flag is more useful for variants that have not been extensively studied (e.g. confirmed by conventional Sanger sequencing). The reliability reporting system has been set up such that variants with no asterisk or one asterisk may be considered "high reliability", and these are the ones to focus on initially. Some of the ** variants may also prove to be useful...the call pattern for other samples in the haplogroupCompare report can be a useful guideline (if the call pattern makes sense, and is consistent with other results, then there is a good chance the variant is genuine). Most of the *** variants are probably false positives. Regarding your question independence of reads, part of the analysis that is performed involves identification of potential "PCR duplicates". This is related to your point about reading the same thing over and over, which, as you note, can bias the results. Reads identified by the computer as PCR duplicates are marked as such and this can be taken into account in subsequent analyses to avoid biases in the results. Hope this helps, Greg On Wed, Jan 8, 2014 at 8:13 PM, Kenneth Nordtvedt <[email protected]>wrote: > The list of I phyloequivalents could be very useful. Thank you for > suppling > it. > > Could you explain the no asterix, one asterix. two ...... rating system. > > Everything I have seen is technical goobly gook talk to me. Can it be > reexplained in plain English. "Number of reads" by itself means nothing by > itself to me. Is the same piece of lab guuu being read over and over? > That would be sort of a waste of time; wouldn't you get the same result if > reading the same stuff over and over? If number of reads is to mean > anything important it must be independent reads on different stuff which > under some model or theory should be yielding the same output. > "Independence" of the operations is key. > > So please; how does the asterix rating system really operate, and how do > they lead to the percentages, whatever they mean. > > > > Kenneth Nordtvedt > > Haplogroup I Clade Modalities and Trees at: > http://knordtvedt.home.bresnan.net > >