RootsWeb.com Mailing Lists
Total: 1/1
    1. [R-M222] Some details about the RCC process, the RCC Matrix, RCC clusters and Interclusters and the RCC-dated phylogenetic tree.
    2. Bill Howard
    3. Recently I have had some off-list correspondence with some people who were trying to explore some of the nuances and relationships between the RCC time scale, the RCC matrix and the phylogenetic tree that has an RCC scale associated with it. Let me try to explain what is going on and how to interpret the results. I have posted seven web sites that I will refer to in what follows. I am confining most of my remarks to only two surnames, each of which contain seven testees and appear close together on the tree -- the Howles and the McGoverns. The same approach and logic applies to other clusters although the ones I chose here are very simple in form. In fact we can analyze many hundreds of haplotypes at a time. Take a look at the following postings at the URLs noted: http://mysite.verizon.net/weh8/Howle-McGovern.pdf These are the haplotypes that I use http://mysite.verizon.net/weh8/HowleCluster.pdf Here are the RCC Matrix RCC values for the Howles http://mysite.verizon.net/weh8/McGovernCluster.pdf Here are the RCC Matrix RCC values of the McGoverns http://mysite.verizon.net/weh8/Howle-McGovernIntercluster.pdf Here are the intercluster RCC values. http://mysite.verizon.net/weh8/M222SurnamesA.pdf Look at the Howle and McGovern clusters on the tree. http://mysite.verizon.net/weh8/TimeSliceMatrix.pdf Time slice matrix for RCC values between 0 and 10 http://mysite.verizon.net/weh8/OCathain.pdf The O'Cathain tree The Howles and the McGoverns have very close connections in time that can be seen on both the RCC matrix and on the latest surname phylogenetic tree. Of course, I make that conclusion based on the approach that I have developed in carrying out the correlation approach on many haplotypes. That approach allows us to make an estimate of the TMRCAs of pairs of haplotypes. When many haplotypes tend to cluster as both these two surnames individually do, then the error of the time estimate goes down by approximately the square root of the number (minus one) until we reach errors in the mutations themselves. Errors we expect are less than about 2-3 in RCC or 180 years in time (and these are standard deviations). Once a cluster has been identified in the RCC matrix (entries 2 and 3, above), a study of the intercluster region of the RCC matrix reveals the TMRCA of the two clusters. The intercluster region on the RCC matrix is found at the intersections of the Howle and McGovern entries, paired together, in the 4th entry, above. On the RCC matrix the intercluster region shows many numbers that represent the two by two RCC values of one member in cluster 1 paired with a member of cluster 2. All the values of all those pairs appear in the intercluster region of the RCC matrix. Now look at the tree (entry no. 5, above). What Mathematica does is to take the paired relationships in the clusters AND the values of the pairs in the intercluster region and it forms a tree. As it does it, it determines the optimum values of the entire haplotypes of pairs in the cluster and presents them on the tree. It also optimizes all the numbers in the intercluster region and puts the TMRCA connection between the two surname clusters at the junction point where those numbers have been optimized. The RCC values on the Tree have been calibrated by using over 100 pedigrees and the result is that 10 RCC ~ 433 years. The tree's time scale is given in RCC units that will not change. If a better pedigree-time calibration is done, then the RCC equivalent in years will change and that's why we plot it in RCC units. Going further…. We know now that every time there is a junction point on the phylogenetic tree, there must be a corresponding mutation because that is what the junctions represent. We can view the situation in reverse. Namely, where there is a junction point there must be a mutation. The real challenge is to find the markers that have mutated to cause the change and the junction point, and we are not there yet. But we can approach that by looking at the marker changes that differentiate one cluster from another - that is, their individual 'fingerprints'. We also know that SOMETHING has to differentiate one cluster from another. Since clusters are defined by the haplotypes of their members, the thing that separates one cluster from another are the cluster members' marker values. So, the difference in how clusters are defined on the tree can be explained by the differences in their marker values. Now, in our example, the thing to do is to look at the marker values of the Howles and the McGoverns and see, not how they are similar, and not how they differ from other clusters, but HOW THEY DIFFER FROM EACH OTHER. That is why the differences in the marker values between any two clusters are so important. It is because each cluster has a fingerprint defined by the marker values of its members. If we look closely at the marker differences between the Howles and the McGoverns (entry no.1, above), we find that out of the 7 examples in each cluster, some markers are entirely, and consistently different. I have shown that three of them at DYS 393, 460, and H4 are entirely different (look at the first URL, above and note those haplotype differences). And there are seven other markers that contain a mixture of marker values. Those marker differences separate these surnames from the rest and help to place them on the phylogenetic tree. Genetics dictates, I think, that when you have a surname cluster that contains different values at a particular DYS, you see that mutations have been occurring in that surname during the recent past. That part of the fingerprint has been mutating before our eyes! Following my papers in JoGG, those mutations will, in time, form subclusters, and as more lines descend, those subclusters will become separate clusters of their own, provided their lines do not die out. We are watching the process now, at one particular snapshot of their mutating evolution. We are observers looking at a snapshot of mutating markers at our particular time in the evolution of the markers on the Y-chromosome. DYS 391, at 10 for both Howle and McGovern clusters, differs from most of the other haplotypes in the sample. That difference sets these two surname clusters apart from other surname clusters, but not from themselves. You have to look for differences of marker values between two clusters when you compare them with each other. All the work I have done indicates without question in my mind that these two surname clusters are closely related for the reasons I give above. Similarities don't lead to important conclusions when two clusters are compared to each other, nor to other clusters. Differences do. More recently we have found that Mathematica can form a tree from the haplotypes directly, without going through the process of forming the RCC matrix, although it can be programmed to do so. While skipping the RCC matrix eliminates a step in forming the tree, I should point out that the RCC matrix can provide valuable insight into the connections. For example, from the RCC matrix you can form a time slice matrix (see my first paper in the JoGG). There you can present the RCC values that show the connections within a given interval of RCC (hence a given interval of time). That aspect of the analysis can have exceptional utility. As an example of a time-slice matrix, I refer to entry 6, above. Here we show a time slice matrix where the TMRCAs of pairs of testees whose names and Kit numbers are on the left. The slice is between RCC= 0 (1945 assumed as the average birth year of a testee) and RCC= 10 (about 433 years ago, or AD 1500). The errors are again estimated to be a few hundred years, so the results should not be over-interpreted. You can clearly see two main clusters along the diagonal. Two rather large clusters can be seen. They would be more filled if we had chosen RCC= 20 for the display, so they are showing closer relationships (genetically) among the TMRCAs of the pairs than usually occur in a typical surname cluster with members at and below RCC ~20. The upper left cluster is a mixture of Gallaghers, Cane/Kane, etc., and a few other surnames. The lower right cluster is more completely filled (it is younger), and is composed of Cane/Kane, etc., and the McHenrys. They are clearly different. All the entries are members of the O'Cathain group of testees sent to me by John McLaughlin. What you see is only a part of the O'Cathain RCC matrix, and only in that particular slice of time. However, entry 7 gives the positions of every testee in a phylogenetic tree of the O'Cathain group. When you compare the entries in the RCC time slice matrix with the positions of those same testees in the tree, the analysis gets quite complicated and questions arise that are still not solved. For example, look at the intercluster region on the time slice matrix colored in yellow. Those pairs of testees have one member from the upper cluster paired with one member from the lower cluster, yet their RCC values are of the same order as the ones in the clusters. This is very unusual and not yet understood. They all have surnames in the Soundex group of Cane/Kane, etc., but there are McHenrys there, too. The latter group have been known to be closely associated with the Cane/Kane group, however, and that might be giving us a clue. I hope our readers will now understand more about the process, how the tree depends on the haplotypes, the value of the intermediate RCC matrix and its time-slice matrix, and how they all tie together. - Bye from Bill Howard

    07/09/2011 09:22:02