-----Original Message----- From: Terry I do note that Ken may have slightly mitigated the negative effect of the "continuum" assumption by replacing all STR mutation rates with just the average rate in his method. [[ I don't think you understand what you are talking about, especially about what "Ken" does. He explicitly does NOT use average rates. He uses best understood individual rates for each STR which in the interclade variance summation over STRs is important for the proper weighting. Some text book treatments end up simply summing STR variances and then dividing by total mutation rate of all markers. That's all right for genealogical era time estimates, but it overweights fast STRs as one is estimating things further back in time.]] That would have the effect of down-weighting the contribution of the slow mutating markers. [[On the contrary, if you examine the weightings in Generations6, for example, you will see it is the fast STRs which are downweighted. KN ]] That is a loss of useful information, but the average rate may have the side effect of reducing somewhat the bigger problem of the "continuum" assumption when applied to a relatively young haplogroup such as I1. Nevertheless, Ken's TMRCA estimate for I1 is still way off compared to the method that both correctly factors in the discrete integer nature of STR allele values, [[ This last comment is especially bizarre. All the algebra or statistics or math that goes into the variance methods is for discrete integer repeat variables, not continuous variables. Where do you come up with this stuff? KN ]]
Ken, As in my footnote in the original message: for the continuous assumption, the distribution of the alleles (or repeat values) at any given generation would be a Gaussian. But for the discrete assumption, the distribution of the STR alleles at any given generation would involve a modified Bessel function. The two distributions do approximate each other, but they are otherwise different allele distributions. In your method, you require some estimate of the variance of that theoretical allele distribution, and the way you are computing that is by assuming that variance is the same as the variance of the alleles (or repeat values) that you happen to see in the surviving population. That is the problem. The variance you see in the surviving population is tainted by the population history, and especially so since many male lines go extinct, and only a few initial male lines get to survive. The Walsh approach is less affected by details of the surviving population. The method also gives the probability distribution for the TMRCA - so one can put say a 95% confidence interval on the TMRCA estimate. But to do that one needs to assume perfect confidence in the STR mutation rates, and the mutation process, which I don't think we can assume or know just yet. Terry