RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: [R-M222] Calculation of a correlation coeficient
    2. Bill Howard
    3. Hi, Steve, My postings to Sandy are apparently not getting through, so I think it is basically pointless to beat on this dead dog much further. A few points, however. The RCC matrix yields a distance, similar to a GD, but not quite the same, between pairs of haplotypes of ALL the members inputted to the Excel Data Analysis Correlation Kit function (not the CORREL one). It is completely valid to compare a pair of marker strings the way I did it. The fact that both the RCC matrix and the optimized position of the tree make the same clusters and haplotype groupings that others make prove the validity of the process. My two published papers have already indicated that the process works at least as well as those involving GDs. Moreover the mutations are averaged and are buried in the process. The same mutation probability applies to each marker of each testee and you don't have to know the probabilities because the calibration of the time scale (basically this distance from each pair of testees to their MRCA, done over 100+ pedigrees, takes care of the problem by averaging out their effects without agonizing about them since they are buried in the calibration. That's what most people don't understand about the time scale of the RCC process. You are correct when you say that the average marker value across all 37, 67, or 111 markers has no obvious significance. The marker differences in two haplotypes strings do. I am most happy to answer questions provided that the posers have read the details of the process that I am using and have read the answers to the FAQs that I posted to make the understanding process more effective. I appreciated the tone of your reply, Steve. But if my postings are not getting though, I fear that it is a waste of time. We'll see if this one gets though….. - Bye from Bill On Jul 17, 2011, at 5:01 PM, Stephen Forrest wrote: > Hi Bill, > > I don't think this discussion is wasting people's time, though I do rather > wish the tone were overall rather more civil. > > I have to say that regardless of the specifics of the Excel implementation, > Sandy's objection to the statistical foundation of RCC is valid. > > The sample correlation coefficient on paired data (x_1,y_1),...,(x_n,y_n) > corresponding to two random variables X and Y is the sum of (x_i - > x_m)*(y_i-y_m) for i from 1 to n, divided by the product of the magnitude > of the vectors (x_1-x_m,...,x_n-x_m) and (y_1-y_m,...,y_n-y_m) where x_m and > y_m are the means of the sample data x_1,...,x_n and y_1,,,,,y_m > respectively. > > The key point here is that the data x_1,...,x_n are supposed to be separate > measurements of a single random variable X of interest. When you use them > for RCC, by your design x_1,...,x_n are the STR markers values themselves. > These are not separate measurements of a single quantity and the value x_m > which is the average marker value across all 37, 67, or 111 markers has no > obvious significance. You are measuring the correspondence between two > random variables on a population of 37, 67 or, or 111, where the population > is marker values and not people. > > This introduces additional problems because marker value ranges vary widely > between markers. Some like DYS710 have high repeat numbers (mine is 35), > while others like DYS 578 are lower (mine is 9). My testing with my own > 111-marker sample has shown that the RCC between my profile and my profile > with a one-point mutation (i.e. the difference between RCC values before and > after)* varies inversely with the distance of the particular marker value > from the mean marker value x_m*. I can supply data if you like. > > There is absolutely no good biological reason why RCC should depend so > closely on marker values: a one-point mutation is a one-point mutation > whether the change is from 34 to 35 or 14 to 15. The fact that that it does > is evidence of the artificiality of this particular measure of genetic > distance, to say nothing of that fact that documented mutation rates which > will certainly affect TMRCA calculations are apparently not included in the > RCC model at all. > > I have a few other points to raise about RCC which I will strive to write up > and post here. I want to emphasize to all however that Bill has done a lot > of work here and that innovation in statistical analysis of genetic data > should to be welcomed. That said, these innovators like all researchers > have to be ready to face criticism and just because a particular objection > has not been raised before is not evidence of its falseness. Thanks to all > for the discussion. > > regards, > > Steve > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message

    07/17/2011 03:15:12