> From: Debbie Kennett > Sent: Wednesday, January 26, 2011 2:19 PM > <snip> > > 1. "A match is obvious; if in doubt it isn't one." > This is not always true. RecLohs can often muddy the > waters and it is also much more difficult determining > whether matches are significant the further back in > time you go, and especially pre-1600 when the paper > records start to run out. I can't see recLOHs as being a problem, unless you're using the TiP calculator. It's the TiP calculator that may not recognize them (and thus see the genetic distance as greater than it really is). You only need to be (or should be) invoking recLOH's when two haplotypes are otherwise a close match, making multiple single-step mutations at a single locus an improbability. In all my projects, I only have one case of a recLOH, and I don't see how it "muddies the water" at all: http://dgmweb.net/DNA/Corbin/CorbinDNA-results-HgR1b-Sherman.html#data I agree that everything gets more difficult as you go further back, but when the paper records run out, my job is finished. It has never been my intention to use DNA testing to do more than support good paper genealogy, debunk bad paper genealogy, and break through brick walls. I have no intention whatsoever of taking my genealogy or my members' genealogies back beyond surname adoption. > I agree with Ralph and James that the FTDNA TiP is a much > more useful and reliable tool than any rule of thumb, > especially as it also copes admirably well with RecLohs > and null markers. Who is using a "rule of thumb"? Not me. My method is dictated by logic, which is a much higher form of proof than statistical probability. I don't know how the TiP calculator handles null markers, but it certainly seems to me that recLOHs are a stumbling block for the TiP calculator, not its strength. In any case, as I said, a statistical probability is not better than a logical yes-no. > Groupings will also depend on the composition of the > project and the definition of a genealogical timeframe. > I would define a genealogical timeframe as the time from > which surnames were adopted and records became available. > In the south of England this is from the 1100s onwards > for some surnames. "Genealogical time" is not something you can arbitrarily define in terms of years or generations. It's defined for each family by when the paper records run out, which happens earlier for some families than it does for others. > Even if a tree cannot be constructed it is usually still > possible to get an idea of the distribution of a surname > from early tax records. Yes, but at which point you're doing history, not genealogy. > Surnames were adopted later in the north of England. > Some parts of Wales were still using patronymic surnames > as late as the nineteenth century. Jewish surnames only > became established in the last few hundred years. Each > project will therefore be different. Yes, I agree, and not just for each project, but for each family, as I mentioned above. > 2. "Making connections on too little evidence; a bad > connection is worse than no connection; grouping too > aggressively; boasting thereon." Please note that is not a quote of what I said, it's a quote of what James said. > I think it is up to the individual project manager > to decide how to group his or her results. Surname > projects are still in the very early days. No study, > other than the early Sykes study on just four markers, > has yet published its results. Forgive me, but I doubt that very many project admins have plans to publish books on their projects. I certainly don't. Our project web sites constitute "publication," and they have a huge advantage over paper publication in that they can be constantly updated. > What is important is that the project administrator > provides a rationale for the groupings to his project > members. For the benefit of other project admins it > helps too if their methodology is at least outline > somewhere on their project website. I agree, and I do, but I would venture to say that makes me an exception because I see very few sites (can't say I've run across any, actually) explaining the criteria used to group members. > What works for a small project might not be manageable > for a large-scale study. I see no fundamental difference in how members should be grouped in small projects or large projects. A match is or is not a match. Large projects are simply a great deal more work -- and I would say, too much for one person to manage well. I've been meaning to suggest to FTDNA that they start allowing common surnames to spin off each of its haplogroups. People in different haplogroups have zero possibility of being related in genealogical time, so there's no logical reason to keep everyone in the same project, where one or even two admins cannot possibly do justice to everyone. > It would also help if other project administrators > followed James Irvine's excellent example and > published papers outlining their own methodologies. I can't see any advantage to publishing in hard copy, though I can see some disadvantages. Put the explanation on the project web site. That will make it more easily available to project members and more widely available to everyone else, and it can be updated at will. > One would expect to find the match rate varying > considerably between surnames with low-frequency > surnames having a higher match rate than more > common surnames, but there will no doubt be some > exceptions. I see no inherent reason for the match rate to be correlated with the frequency of the surname. The number of descendants per progenitor is probably random. What makes a surname common is almost certainly correlated with the number of times it's been adopted, rather than an increased number of descendants per progenitor. In the case of "displaced populations" (as in the U.S., Australia, etc.), there is likely to be a correlation with how early the progenitor arrived and how many progenitors of that surname arrived. > It will be interesting to see what emerges as more > studies start to publish results. I really can't see many surname project admins publishing their studies. Why? The projects are ongoing, so even if you published, the publication would soon be out of date. The beauty of the web is that your "publication" (your web site) can be current. > A high match rate could just be a result of > sampling bias. If one person emigrated to the US > in the 1600s and produced huge numbers of descendants > and only people with this surname in the US were > tested then the project would have a very high match > rate. A completely different picture might emerge if > all the lines were tested in the country of origin > instead. I've not done the stats but I know that in > my own project my American project members have an > exceedingly high match rate with most of them falling > into one large group. The rate is much lower for > my UK participants, and I have lots of singletons > waiting for matches. The match rate can vary based on *many* factors, which is the reason "match rate" isn't a statistic worth gathering, IMO. Even if you knew the match rate for every name in every project, what use is knowing it? Just because you can run a statistic on a set of numbers doesn't mean it tells you anything worth knowing. > If my Americans don't match their own surname they > usually have a 67-marker match with another surname > instead. When someone has a 65/67 or better match in another surname, it almost certainly means someone has an NPE. But I have many more members who match no one than have NPEs. > As Ralph has pointed out, paper trails can also > be wrong, and this can sometimes lead to incorrect > groupings within a project. I don't see how that mistake could be made. If the paper trail is wrong, the DNA test results will tell you, loud and clear. Probably the greatest strength of DNA testing is its ability to reveal bad paper genealogy. I would certainly never allow paper genealogy trump DNA evidence when it came to grouping members. > 3. The major goal is crossing the pond. > This statement only applies to American projects. > UK participants, for example, do not tend to be > interested in crossing the pond and finding > matches in the US. For the origin of my surname > I'm more interested in finding matches across > the English Channel in France and Belgium. I'm afraid that statement is taken out of context. If my meaning wasn't clear, I apologize because, in context, what if said was: Once an American has connected to their immigrant, both on paper and via DNA test results, *then* the major goal is crossing the pond. > Surely the major goal of any surname project should > be to collect as many DNA samples as possible to > represent all the major lineages for the surname? Yes, of course. That goes without saying. > This means recruiting in all the countries where > the surname is to be found with the priority to test > documented lineages from the country of origin to > serve as a baseline. More importantly it means > presenting the DNA project in such a way as to appeal > to potential project members from around the world. > There are far too many projects which seem to > concentrate only on the surname in the US on their > website and then wonder why they can't attract > testees from other countries. The emigrant lines > to the US are just a small subset of the surname, > even if some of the trees are exceptionally large. I don't know a project admin who isn't desperate to have Europeans tested for their project. Please look at the subsidies offered for my projects: http://dgmweb.net/DNA/Carrico/CarricoDNA.html#subsidies http://dgmweb.net/DNA/Corbin/CorbinDNA.html#subsidies http://dgmweb.net/DNA/Rasey/RaseyDNA.html#subsidies http://dgmweb.net/DNA/Straub/StraubDNA.html#subsidies I have spent hundreds of dollars subsidizing the tests of Europeans for my projects, and I'm offering hundreds more, so I think it's an unfair criticism of "American" project admins that they don't care or aren't trying to recruit Europeans. Diana
Diana I previously cited on this list the RecLoh in the Aldous DNA project: http://www.familytreedna.com/public/Aldous/default.aspx?section=yresults At first glance the two results don't match at all, and presumably you would not count this as match. Neither man shows up in the other person's list of matches. This is an example where the TiP is particularly useful. The TiP gives the two men a 74.32% chance of sharing a common ancestor within 24 generations. On paper the two men supposedly share a common ancestor in the 1400s. This result is therefore within the bounds of probability, but it is not clear-cut. Interestingly, despite this result being somewhat extreme, it's not too far off James Irvine's 80% TiP probability which he uses as a cut-off in the Irvine project. >I agree that everything gets more difficult as you go further back, >but when the paper records run out, my job is finished. It has never >been my intention to use DNA testing to do more than support good >paper genealogy, debunk bad paper genealogy, and break through brick >walls. I have no intention whatsoever of taking my genealogy or my >members' genealogies back beyond surname adoption. I'm not talking about taking genealogies back before the adoption of surnames, I'm talking about deciding whether or not results are related *since* the adoption of surnames. The records often run out or are incomplete for the first five hundred or so years after the adoption of surnames. Are you therefore discounting matches where surnames could potentially be related in the 1100-1600 period where very few people have paper trails? >Who is using a "rule of thumb"? Not me. My method is dictated by >logic, which is a much higher form of proof than statistical >probability. I don't know how the TiP calculator handles null >markers, but it certainly seems to me that recLOHs are a stumbling >block for the TiP calculator, not its strength. In any case, as I >said, a statistical probability is not better than a logical yes-no. How are you deciding the criteria for a match? You must have a cut-off point somewhere which is effectively a rule of thumb. Mutations occur at random and don't necessarily follow logical patterns. The best we can do is study large bodies of data and see the range of possibilities that might occur and then make decisions based on all the available evidence. Results do not always give a straightforward yes-no answer. The RecLoh result cited above in the Aldous project is a case in point. >"Genealogical time" is not something you can arbitrarily define in >terms of years or generations. It's defined for each family by when >the paper records run out, which happens earlier for some families >than it does for others. We obviously share different definitions of genealogical time. I have some lines where the genealogical records can't be traced back before the 1800s. However their DNA matches clearly place them within a specific tree which has been well researched, even if the link in the paper trail record cannot be found. I have no hesitation in adding them to their respective genetic families despite the lack of a paper trail. I regard genealogical time as the time when genealogical records containing surnames start to become available. This varies from one culture to another. For my purposes researching an English surname the records begin in the 1100s. The earliest occurrence of the surname Cruwys/Crues dates from 1160. > Even if a tree cannot be constructed it is usually still > possible to get an idea of the distribution of a surname > from early tax records. >Yes, but at which point you're doing history, not genealogy. I would regard an investigation into the origins of a surname as a valid genealogical technique. The technique is well described in George Redmonds' book "Surnames and Genealogy: A New Approach". As far as English records are concerned there are numerous records available from the 1300s onwards. I am fortunate that with the Cruwys surname the records have been held in one family in the same location for over 900 years and it is possible to construct a genealogical tree with a reasonable degree of confidence back to the 1200s. This won't be possible in most cases, but it's still possible to get a good idea of the frequency and distribution of a surname by looking at early records. >Forgive me, but I doubt that very many project admins have plans to >publish books on their projects. I certainly don't. Our project web >sites constitute "publication," and they have a huge advantage over >paper publication in that they can be constantly updated. It is entirely an individual choice, but papers published in journals or written up in books can be cited by other researchers. Results published on websites can't be cited in the same way except as personal communications and are less reliable because of the lack of third-party review. >I see no fundamental difference in how members should be grouped in >small projects or large projects. A match is or is not a match. >Large projects are simply a great deal more work -- and I would say, >too much for one person to manage well. The administrator of a large DNA project cannot possibly have time to be intimately involved with the individual genealogies of his or her project members and therefore has to take any submitted pedigrees on trust. I agree that in an ideal world large projects should have multiple admins, but willing and qualified volunteers are not always available. Results are not always straightforward and do not always provide a simple yes/no answer. If the results are backed up with reliable genealogical data then there is no problem. The further back in time the more difficult it gets. As an example, take a look at my Cruwys group 1 and in particular kit no. 130860 and his relationship to the other men in this group: http://www.familytreedna.com/public/CruwysDNA/default.aspx?section=yresults On 37 markers kit no. 130860 doesn't look as though he is related to the rest of the group. At 67 markers it looks more likely. I think in this case a Cruise line went from Devon to Ireland shortly after the Anglo-Norman invasion of Ireland in 1169. The Devon line can be traced forwards to the present day. The Cruise/Cruys surname is found in very early Irish records but the line of the Irish project member cannot be traced back before the 1800s. The TiP result puts the match within the bounds of probability, and logically this is the only explanation I can find for the closeness of the results, but it is not a cast-iron case or a simple yes-no answer. >I see no inherent reason for the match rate to be correlated with the >frequency of the surname. The number of descendants per progenitor is >probably random. What makes a surname common is almost certainly >correlated with the number of times it's been adopted, rather than an >increased number of descendants per progenitor. In the case of >"displaced populations" (as in the U.S., Australia, etc.), there is >likely to be a correlation with how early the progenitor arrived and >how many progenitors of that surname arrived. No one knows the answers to these questions yet. Anecdotal evidence suggests that English lines that emigrated to countries such as the USA and Australia multiply at a much greater rate than those lines which stayed behind. >I really can't see many surname project admins publishing their >studies. Why? The projects are ongoing, so even if you published, >the publication would soon be out of date. The beauty of the web is >that your "publication" (your web site) can be current. It is standard scientific practice to publish results of ongoing research. Our DNA projects are effectively scientific research projects. If our research is to be recognised it needs to be published. I suspect few project admins will ever publish their results but there's no harm in encouraging them to do so. >The match rate can vary based on *many* factors, which is the reason >"match rate" isn't a statistic worth gathering, IMO. Even if you knew >the match rate for every name in every project, what use is knowing >it? Just because you can run a statistic on a set of numbers doesn't >mean it tells you anything worth knowing. This is the sort of statistic that is well worth knowing and it is why papers such as James Irvine's are of particular value. If comparative statistics are available new project admins will have a baseline to work from, and will have some idea what to expect as their project grows. >I don't see how that mistake could be made. If the paper trail is >wrong, the DNA test results will tell you, loud and clear. Probably >the greatest strength of DNA testing is its ability to reveal bad >paper genealogy. I would certainly never allow paper genealogy trump >DNA evidence when it came to grouping members. It all depends on how many results you have and how reliable the paper trails are. There are certainly cases of people claiming a higher than normal mutation rate to try and fit DNA results to a dodgy pedigree. The men will no doubt all share a common ancestor but it might not be the person that they thought it was. >I'm afraid that statement is taken out of context. If my meaning >wasn't clear, I apologize because, in context, what if said was: Once >an American has connected to their immigrant, both on paper and via >DNA test results, *then* the major goal is crossing the pond. That's fair enough, but I'm looking at this from a different perspective. I get someone in the UK who might be interested in DNA testing and when I look at the relevant project website there are links to the xxx surname society of America, discussions about the distribution of the surname in America, lots of baffling abbreviations for American state names and desperate statements about how they hope to make connections across the Pond. It's very hard to motivate someone to take a test when the project doesn't even acknowledge the existence of the surname in their own country and they can't see how they can benefit by testing. This does not apply to all projects, but it does to quite a few, including some projects for the most common surnames. One of the very common surnames is split into three different projects and the so-called worldwide project has a list of US states followed at the end by "all points abroad", with no acknowledgement that this surname existed for several hundred years in England before it even arrived in America. It does not exactly encourage anyone from outside the US to take a test with this project. >I have spent hundreds of dollars subsidizing the tests of Europeans >for my projects, and I'm offering hundreds more, so I think it's an >unfair criticism of "American" project admins that they don't care or >aren't trying to recruit Europeans. I'm not criticising all projects. Your presentation is good and is appealing to non-US testees but this is sadly not the case for many projects, and is a significant barrier to encouraging more non-Americans to test. >At this early stage in the project, the most burning questions have to do >with the relationships between the U.S. immigrants, that is, between the >American progenitors and their origins in Europe. DNA testing is ideally >suited to answering these questions. DNA is indeed ideally suited to answering these questions but I would suggest that you do not put this wording on your project websites. This might be a burning question for people with the surname in the US, but it is irrelevant for anyone with the surname in any other country. The project presentation needs to be neutral and not written from the perspective of the people in just one emigrant-receiving country. Debbie Kennett