RootsWeb.com Mailing Lists
Total: 2/2
    1. [yDNAhgI] Sigma for Variance
    2. Kenneth Nordtvedt
    3. A powerpoint file “Sigma for Variance” is now completed at http://knordtvedt.home.bresnan.net The first two slides summarize the analytically determined formulas for the distribution sigmas accompanying tmrca estimates for y trees under our simple models of STR mutation behavior. These formulas were confirmed as correct using hundreds of thousands of simulations. For those not so interested in the underlying derivations of sigma sizes, the last three slides in “Sigma for Variance” give some numerical outcomes or formula results for a broad assortment of y tree types. Both some genealogical time frame examples and deep time frame examples are given. The evidence accumulates that STRs mutate according to more complex probability rules than the simple model. Independent of whatever tmrca estimate adjustments more realistic mutation rules require, the intrinsic sigmas associated with the estimates will still be there and behaving approximately like the sigmas from these formulas found from the simple model. Many journal articles as well as other sources have thrown out unreasonably small sigma values applying to their tmrca type age estimates for y trees. What finally drove me to derive these sigma formulas from underlying principles was the recent and much discussed journal article on a favorite Y haplogroup R1b...... which claimed some very unreasonable, small sigmas or statistical error bars. On checking with authors on methodology for their published sigmas, it was clear they were a measure of the tmrca fluctuations associated with taking random subsets of haplotypes from their large total collection of haplotypes, and estimating tmrcas with each subset. They were generating sigmas due to sampling! Each subset of haplotypes has its own slightly different tree, so of course it captures slightly different collections of mutations leading to slightly different tmrca estimates. That’s sampling noise, but that error misses the intrinsic error which will be present if each and every male in the world of the clade in question had his haplotype included in the total set. The intrinsic tmrca error (whose size on average the sigma formulas give) exists because nature’s one time insertion of random STR mutations into the underlying tree leads to a variance or asd which will typically be off the mark of its expected value --- due to the particularities of location and number of the STR mutations for this one random case from nature. Because of the collapse of y trees to single founders, this intrinsic sigma is typically the dominant contribution to total sigma. It can only be reduced in size by using more and more independent STRs which can be thought of as independent and multiple random mutation runs through nature’s one tree.

    11/26/2011 08:55:34
    1. Re: [yDNAhgI] [DNA] Sigma for Variance
    2. Terry
    3. Ken, Regarding your formula for "sigma", on the first slide of your PowerPoint - is that meant to be for a single STR marker? Or is your formula for "sigma" meant to be for the standard deviation about the TMRCA computed from a set of say 67 STR markers? To me "sigma" should get a little better (smaller) if you used a few more STR markers, but your formula seems to be independent of the number of STR markers. Also, your formula is surprisingly complex, so perhaps you could share your working. Terry On Sun, Nov 27, 2011 at 9:55 AM, Kenneth Nordtvedt <knordtvedt@bresnan.net>wrote: > A powerpoint file “Sigma for Variance” is now completed at > http://knordtvedt.home.bresnan.net The first two slides summarize the > analytically determined formulas for the distribution sigmas accompanying > tmrca estimates for y trees under our simple models of STR mutation > behavior. These formulas were confirmed as correct using hundreds of > thousands of simulations. For those not so interested in the underlying > derivations of sigma sizes, the last three slides in > “Sigma for Variance” give some numerical outcomes or formula results for a > broad assortment of y tree types. Both some genealogical time frame > examples and deep time frame examples are given. > >

    11/29/2011 04:16:01