If I may weigh in on James' points: Point 1: Explicit Yardsticks Sharing his "large-project" perspective, I agree. As the number of comparisons to make (i.e., participants) increases, matches become less obvious; one MUST rely on formal rules & procedures, rather than hunches. (And, often on division of responsibilities among admin team members.) Furthermore, even small projects may benefit from formally stating, in writing, the rules by which a match is declared. Admins should not shy away from commitment to their decisions. It's better that all participants know the rules (explicit yardsticks) of the particular game before paying to play than that the rules are arbitrary and changing. The methods (tools) employed can and should change; hopefully they improve over time. While the FTDNA TiP tool is difficult to explain in plain language, it is probably the most sophisticated now available -- taking advantage of both single-step and multi-step mutations. Point 2: Connections on too little evidence I'm not sure whether we're speaking of DNA evidence or paper trail evidence. If DNA, it's accepted that mutations are random events and thus the probability aspect can not be ignored; it must be included in the drawing of conclusions. By corollary, certainty is ruled out; the best we can achieve is a level of confidence. Match rates -- high or low -- may reflect homogeneity or diversity of the target (e.g., surname) population's DNA. It may be that common surnames of multi-point, occupational origin contain greater diversity than less-common names of single-point origin. They may have more unrelated CMAs. What it is that different match rates reflect is unknowable without comparing the bases of the matches. (See point 1.) However, James has disclosed his yardstick, and ours (below) is a little looser, so his 87% vs. our 42% probably reflects homogeneity/diversity between Clan Irwin on the one hand and Taylor on the other. It's gratifying to see that he's been able to provide his participants with such valuable information. {Taylor Project Y-DNA Match Rule: A high (>85%) probability between two or more participants of sharing a common male ancestor within a genealogical time frame. "Genealogical time frame" means since ~1350 or -- in DNA terms -- 55 transmission events between any two compared lines. (This rules out all 12-marker matches; none rise to the 85% cumulative probability level at 55 TE.) Matches are not dependent on paper trails; few paper trails reach back to 1350.} If paper trail, the paper simply assures that a participant has done his or her homework and had some success with written records. Being on paper does not guaranteed accuracy. A recent example in my project illustrates: A new participant presented a paper trail showing descent from a historically prominent family, one in haplogroup R1b2. Then his Y-DNA results came back; he closely matched, not the family the paper trail showed, but another in haplogroup I1. He can not share a common, direct paternal ancestor (within many thousands of years) with the family he thought. He's currently taking another look at the research. Point 3: Crossing the pond Tracing one's ancestors to a specific family on the other side of an ocean is a worthy goal. It may not always be an attainable one. In many cases, the records we'd need were not created; in others, they've been lost by time and events. Also, it seems that few projects (with American DNA test providers) have succeeded in gaining adequate penetration on that other side of "the pond". -ralpht_/) PS: Apologies for not trimming the back-quotes. I couldn't see how without doing damage to the statements. Message: 1 Date: Mon, 24 Jan 2011 22:05:44 +0000 From: James Irvine <jamesmirvine@hotmail.co.uk> Subject: Re: [Y-DNA-projects] Y-DNA-PROJECTS - Making matches, crossing the Pond Diana, Thanks for your comments. Your message raises three inter-related points I am taking up: 1. a match is obvious; if in doubt it isn't one; 2. making connections on too little evidence; a bad connection is worse than no connection; grouping too aggressively; boasting thereon 3. the major goal is crossing the pond. Your first point is no doubt usually true in small projects, but when one gets a project such as the Clan Irwin project with > 200 participants, with its largest single group of >100, I found it essential to adopt some explicit yardstick to avoid the "why him and not me" questions. So I had to opt for some arbitrary "rule". For better or worse I opted for 90% TiP (using, again arbitrarily, the highest available resolution, 24 generations, nil paper trail, with minimum of 2 participants per group (unless "already" closely related, or with a pre-1600 paper trail)). The result seems to work well, except that it unfortunately includes rather than excludes a few "12/12" participants (but even here this has helped to induce many of my former 12-marker participants to upgrade). This rule resulted in a match rate of 87% at the time of my paper "Towards improvements in y-DNA Surname Project Administration" in JOGG (www.jogg.info) Fall 2010 (Appendix A, line 36) (and unchanged since). The Dalton and Blair projects were nearly as high. I don't claim this as an "achievement", but suspect its being higher than many other surnames is more dependant on the type of surname and on the size of project than any "skill" or "manipulation" on my part. Even if you remove my "minimum of 2" and "unless" exceptions it is 82%. And with this more stringent rule and removing all results <37 markers it is still 87%. Even with a minimum of 37 markers and a simple criterion of 95% 24-generation, no-paper-trail TiP, 85% of Irwins participants could be matched/grouped! You may of course criticise my TiP of 90%, but in fact I chose TiPs as the most sophisticated available tool (whatever its bias may be), and chose the 90% criterion on exactly the same basis as your thinking: I think it is the most obvious, leaving aside the 12-marker results. If you examine the results table at www.clanirwin.org > DNA Study I will be interested if, on the basis of the above, you come to some other conclusion. Aside from low resolution results aside (and perhaps some of my very minor exceptions), I don't think this bar is "too low". Your other criticism will be my using probabilities alone, and not taking into account paper trails (except to the limited extent outlined above). Here I suspect you have a stronger point, for of the 87% of my participants resident in the "new world", all but one of those resident in USA are unable to trace a paper trail back to the British Isles (the relatively few Australasian Irwins, who generally migrated later, are more fortunate). And as, I suspect, most had ancestors who migrated from the Scottish Borders to Ulster in the 17th century and then from Ulster to USA in the 18th century, I fear they never will be able to complete a paper trail back to their geographic origins. But despite the instincts of Chris Pomeroy and his like, I do not feel that lack of a paper trail should prevent grouping. Another factor relevant here is the questionable reliability of many paper trails: I am sure you find a fair number include dubious assumptions of relationships between like-named individuals, what I call ?IGI pedigrees?. DNA of course can ?see through? some, but not all, of these false assumptions. So turning to the "crossing the pond" dimension, of the 21 groups I have now identified in the Clan Irwin project, 4 presently include no "new world" migrants, but I have been able to "cross the pond" for 15 of the remaining 17. The "pond" here includes the Pacific as well as the Atlantic. With the single USA and a few Australian exceptions, none of the participants in these groups have a completed paper trail back to the 16th century, but having worked with the surname for half a century I have, like you, a pretty intimate knowledge of the name and its various branches. Perhaps I still do not appreciate how lucky I am to be working with this particular name. Certainly I have been able to please the high proportion of participants for whom I have been able to use DNA to identify their geographic origins in the middle ages, and I have no doubt this "success rate" has been an important contribution to the growth of our project. Perhaps my greatest sin is my suggesting my methods are applicable to other projects. However in mitigation I simply observe that (a) I have the benefit of a much larger data base to work from than many others, with its accompanying constraints (e.g. an explicit rule was needed) and opportunities (e.g. picking an "obvious" cut off level), and (b) it is only a suggestion: if one or more of my ideas are no help, then they can be discarded! All this said, I do warmly concur with your statements I summarise in point 2. above, and crave your indulgence if I occasionally exhibit the human trait of erring a little! I make these points not in boasting, but hopefully to help widen readers' perspectives of what some yDNA studies can achieve. James. -----Original Message----- From: Diana Gale Matthiesen Sent: Friday, January 21, 2011 11:52 AM Subject: Re: [Y-DNA-projects] Y-DNA-PROJECTS Intersting Situation - Thanks > From: James Irvine > Sent: Friday, January 21, 2011 4:28 AM > > Thanks, Diana - there at last, and I've found some lurking in my > project! > > However for calculating Genetic Distance I see FTDNA's former > composite model took account of recLOHs, and of course recLOHs > are irrelevant in the Infinite Allele model they are now using > in GAP 2.0. Right? I'm afraid I've never used the Genetic Distance calculator on my GAP, so I have no idea how it works. If I need a GD, I will calculate it myself. One of the biggest problems in paper genealogy is people making connections on too little evidence, to which I say: a bad connection is *not* better than no connection. I carry the same attitude into my DNA projects. If a match is good it's generally so obvious it hits you in the face, and I don't need a GD calculator to see it. If it isn't obvious, I don't call it a match because then you're making the same mistake novice genealogists make. It seems to me the urge to group as many project members as possible leads some project admins to group too aggressively. IMO, it's better to leave people unassigned than to group them incorrectly. If you like, I can give you an example of a project that groups overzealously, but in a private email because I'd rather not criticize a project on the list. [The admin is not a subscriber to this list, so y'all can relax.] The admin even brags on the project's home page that, "...in the [_____] project, 86% of the [______]s tested have at least one meaningful match. Truly an amazing percentage." And it would be, if it didn't actually mean he groups people too readily. It's a large project, and it appears the admin is using percent probabilities *alone* in his decisions to place people in groups -- with the bar set to low. I'll admit to having an advantage because I've standardized on 67 markers for my projects, and with 67 markers, it's *very* much easier to tell when you've got a match, or not. It is also the case that I do not admit a member without them supplying the paper genealogy of their patrilineal line. I may know, from the moment they purchase their kit, whom they should match. When their results return, they either will or won't match an existing group and, if not, they remained unassigned until someone new matches them. My other advantage is that I am not working on common surnames: three are rare and two are relatively uncommon. I know these surnames well as I've been studying them for years, and we have most of the major American progenitors tested. The major goal, at this point, is crossing the pond. Still, when it comes to using data on "my" families extracted from other, larger projects, I find myself using the same m.o. I use in my own projects. When it's a match, it's obvious, provided they've tested enough markers. Diana