RootsWeb.com Mailing Lists
Total: 1/1
    1. Re: [Y-DNA-projects] Y-DNA-PROJECTS Making matches
    2. James Irvine
    3. The most important point is this discussion, and in my paper (www.ISOGG.info > Fall 2010), and a challenge shared by the administrators of all surname projects, has been identifying the most appropriate criterion to define what constitutes a “close match” between the participants of a project. Identifying and defining this criterion is necessary both (1) to decide when to establish a new cluster/group/genetic family, and (2) to determine whether a new participant qualifies for membership thereof, or has to be assigned as a singleton. It seems timely to summarise this aspect of the discussion, together with relevant points in Appendix C of my paper and some developments that have arisen since: 1. Administrators can make their own choices on matching participants, based on personal preferences and skills (some are primarily geneticists, some only genealogists), project circumstances and goals, and emerging evidence. At this formative stage of administrating surname projects any “recommended” standardisation for defining matches is clearly premature, for while it easy to criticise some practices, there is not yet any consensus on what “best practice” should replace them. 2. Nevertheless some tips, if not guidance, are desirable for new administrators, for administrators considering change, and for administrators seeking to compare their projects with others (for which purpose some consensus will surely eventually, and hopefully, emerge). 3. Should the matching criteria be private to the administrator (and hence implicitly flexible), or published (whether in the classic sense or simply on project web pages, although on this latter point I don’t think Debbie’s distinction is substantive) (and hence more permanent)? Here project size may be relevant, for when a project is small a fixed “rule” may be difficult to justify and, as Diana pointed out, the administrator’s discretion may be desirable. But as Ralph noted, administrators are not immortal, small projects hopefully grow into big ones, and administrators can learn lessons from other projects. Certainly an explicit and “published” criterion/yardstick/rule-of-thumb is desirable for large projects: it avoids “why him and not me” questions, suspicions of subjectivity, and inconsistencies (e.g. I use 80% TiP, but to my shame was distracted in a senior moment when I replied to Diana on 24th and inadvertently substituted 90%!). It is also desirable when comparing projects, as in my paper, in order to avoid errors arising from "mixing apples and pears". 4. Another issue is whether to adopt the published usage of a testing company, or to develop one’s own criteria. The former reduces arguments and may appeal to some non-geneticists, though the published usage may change from time-to-time and may be poorly explained (see 6. below). The latter is more flexible, and may appeal to geneticists (e.g. Diana) and to those familiar with data management problems (e.g. Ralph and myself). 5. The biggest issue is whether to use some Genetic Distance parameter or some TiP parameter. The former is more transparent and more easily explained to newbies. The latter takes account of individual marker mutation rates, and (perhaps rather arbitrarily) technicalities such as recLOHs and null markers, and a single threshold value may be used regardless of resolution differences. It is more difficult to explain to newbies and, being opaque, to justify, but once understood is simple in concept – what-you-see-is-what-you-get: a measure of probability. That its numeric value may be biased to some unknown extent is irrelevant in this context (see 7. below). It also has the incidental advantage of simultaneously providing a convenient administrative tool for ranking participants within their genetic families. A compromise which I have adopted, is to display both Genetic Distances and TiPs (see the Table at www.clanirwin.org > DNA Study) but to use TiP as the determining criterion. 6. The Genetic Distance option needs definitions: not just choosing some arbitrary matching criteria such as 1/12, 2/25, 3/37, 7/67 or whatever (see page 8 of my paper), and clarifying whether such a “3/37”, for example, is deemed a “close match” or a mismatch, but also stating whether such criteria are based on the Step-wise or Infinite-alleles protocol. This latter point has been complicated by FTDNA changing from the hybrid protocol they formerly used (without identifying it as such) in their old GAP to the Infinite-alleles protocol they have apparently adopted (unannounced) in their GAP2.0. 7. The TiP option also needs definitions. I have chosen, quite arbitrarily, the 24-generation, “no paper-trail” probability: the former parameter is the oldest conveniently available and takes the possible common ancestry back to, very roughly, the beginning of the surname era in the Britain Isles, rather than to some more recent (but usually unknown) “earliest common ancestor” (which of course what the TiP facility was primarily designed for); the latter avoids invoking subjective and complicated considerations of paper trails of dissimilar length and reliability. One has also to choose some threshold “pass/fail” probabilities: logically these might be, for example, 95%, 90%, 85% and 80% for 12-, 25-, 37- and 67-marker comparisons respectively (plus, perhaps, 5% when assessing possible NPEs with different surnames). But with the unusual luxury of a surname project with a single genetic family with over 100 participants I have found such refinements unnecessary: in practice a single, arbitrary threshold of 80%, using the highest compatible resolution available, seems to neatly avoid any marginal assignments and at the same time exclude a few clearly dubious possibilities. Some may argue this 80% criterion is too low, and that an unassigned participant is better than a wrongly assigned participant, but participants naturally dislike being assigned as singletons, and assignment can always be revised if subsequent developments so require (such as when greater resolution becomes available or when the DNA signature of a modal haplotype changes marginally as a small project grows). 8. Critical as the establishment, publication and application of an explicit criterion/yardstick/rule-of-thumb may be, it is only one step in administering a Surname DNA project. The DNA or paper trail of some individual participants may warrant exceptions; the geographic origin of each group/genetic family needs to be identified; sub-groups need to be identified and defined and their origins determined (the matter that Clovis LaFleur has just introduced, where TiPs for < 24 generations will obviously be relevant); and as Chris Pomery keeps reminding us, paper trails need to be developed and related to each genetic family. For administrators contemplating changing from some Genetic Distance criterion to some TiP criterion, the workload involved is much reduced with GAP 2.0 by the availability of multiple TiP Reports comparing individual participants on the personal Genetic Distance page of the relevant modal haplotype, accessed by using the relevant “Select” arrow on left hand side of the project “Y-DNA Genetic Distance” page. It is unlikely such a change will lead to many individual participants being re-assigned, but I submit the exercise will make your project become more transparent and robust. James Irvine

    01/28/2011 01:18:15