I managed to dig up a Statistics text-book that has a worked example of how a correlation coefficient is calculated. The book is called Statistical Analysis, by Edward C Bryant, published by McGraw-Hill. The example is given on page 139 and the table input table is on page 141. The example entails estimating the correlation coefficient for a population, from a sample drawn from a bivariate normal distribution. The idea in this case is to see how well mid-term examination results were correlated to the year-end examination results in the subject of elementary statistics. There were 20 students included in the sample. The answer given is 0.642 Here is the data from the table : Student No Mid-term (X) Year-end(Y) 1 35 60 2 60 80 3 55 60 4 35 80 5 35 75 6 50 90 7 30 60 8 60 105 9 50 60 10 20 30 11 55 90 12 45 75 13 40 80 14 60 80 15 40 45 16 60 80 17 50 80 18 55 95 19 50 100 20 35 75 This can be done from first principles and the answer arrived at is 0.642. Alternately, you can do it in Excel. To do this, enter the X values into column A, and the Y values into column B. Then use the CORREL function to calculate the correlation coefficient. An Excel spreadsheet that does this is shown at http://dl.dropbox.com/u/2733445/STATSTUDENT.xlsx The answer is given in cell C20, namely 0.642 but to a few more decimal places. Here's my point : Correl in Excel is set up to accept bivariate data for a sample population of n (in the above example n=20). You are required to enter, for each of the 20 in the sample, the variable X (in column A) and the variable Y (in column B). Bill Howard does something completely different. He enters data for only 2 people, but he enters 37 numbers into column A and 37 numbers into column B. Excel interprets this to mean that there are 37 people (not just 2), and two variables (it is bi-variate), not 37. Correl is not set up for multivariate analysis - it is set up for bivariate analysis. This must be the most hilarious example I've ever seen of someone blindly banging data into software without having the faintest idea what the software does, or the input format that the software requires. Sandy
Sandy, You are wrong. Please don't mislead the post readers without understanding what I did and how I did it. You don't use Excel's process on a matrix; you used it on only a string of values. You need to use the approach on a matrix of ROWS of numbers, not just a couple of columns. You don't understand how I did the analysis --- I anticipate that most of what you wrote before your four paragraphs at the end of your posting are correct, but the last four paragraphs are incorrect. I enter data for everyone and run a correlation at one time on all of them, not just a pair as Sandy writes. I do not use the Correl command at all. I use the Excel data analysis kit, as explained in my papers. The Excel data analysis kit, or Mathematica, or any of the other programs that can run a correlation on a matrix (not on just two columns) correlates each row of haplotypes against ALL other rows of haplotypes. You must click on ROWs to be correlated, not COLUMNs. This is not the first time you have criticized me, but each time I have shown that you either (1) did not understand what I was doing; (2) did not understand how I did it, (3) did not do a deep enough calculation yourself and just skimmed the top; (4) did not put error bars on what you did, and on and on and on. Your contention that "This must be the most hilarious example I've ever seen of someone blindly banging data into software without having the faintest idea what the software does, or the input format that the software requires" is again both wrong, excessively vituperative and entirely unwarranted. In addition, you are tending to dominate the postings with these types of criticisms and I for one would appreciate hearing more discussions from others. I suggest you back off and do a deeper analysis of what I have done, reading my papers and my FAQ, and doing your homework -- correctly. I hope others on this rootsweb site will agree. - Bye from Bill Howard PS -- I will be sorry to see if both David and John step down from their very important role as the M222 Project Administrator. They have made very important contributions, and John's very knowledgable postings have been much appreciated. --------------------------- On Jul 17, 2011, at 4:18 AM, Alexander Paterson wrote: > I managed to dig up a Statistics text-book that has a worked example of how > a correlation coefficient is calculated. The book is called Statistical > Analysis, by Edward C Bryant, published by McGraw-Hill. The example is given > on page 139 and the table input table is on page 141. > > > > The example entails estimating the correlation coefficient for a population, > from a sample drawn from a bivariate normal distribution. The idea in this > case is to see how well mid-term examination results were correlated to the > year-end examination results in the subject of elementary statistics. There > were 20 students included in the sample. The answer given is 0.642 > > > > Here is the data from the table : > > > > Student No Mid-term (X) Year-end(Y) > > > > 1 35 > 60 > > 2 60 > 80 > > 3 55 > 60 > > 4 35 > 80 > > 5 35 > 75 > > 6 50 > 90 > > 7 30 > 60 > > 8 60 > 105 > > 9 50 > 60 > > 10 20 30 > > 11 55 90 > > 12 45 75 > > 13 40 80 > > 14 60 80 > > 15 40 45 > > 16 60 80 > > 17 50 80 > > 18 55 95 > > 19 50 > 100 > > 20 35 75 > > > > > > This can be done from first principles and the answer arrived at is 0.642. > > > > Alternately, you can do it in Excel. To do this, enter the X values into > column A, and the Y values into column B. Then use the CORREL function to > calculate the correlation coefficient. > > > > An Excel spreadsheet that does this is shown at > > > > http://dl.dropbox.com/u/2733445/STATSTUDENT.xlsx > > > > The answer is given in cell C20, namely 0.642 but to a few more decimal > places. > > > > Here's my point : Correl in Excel is set up to accept bivariate data for a > sample population of n (in the above example n=20). You are required to > enter, for each of the 20 in the sample, the variable X (in column A) and > the variable Y (in column B). > > > > Bill Howard does something completely different. He enters data for only 2 > people, but he enters 37 numbers into column A and 37 numbers into column B. > Excel interprets this to mean that there are 37 people (not just 2), and two > variables (it is bi-variate), not 37. > > > > Correl is not set up for multivariate analysis - it is set up for bivariate > analysis. > > > > This must be the most hilarious example I've ever seen of someone blindly > banging data into software without having the faintest idea what the > software does, or the input format that the software requires. > > > > > > > > Sandy > > R1b1c7 Research and Links: > > http://clanmaclochlainn.com/R1b1c7/ > ------------------------------- > To unsubscribe from the list, please send an email to DNA-R1B1C7-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message