Thanks, Diana - there at last, and I've found some lurking in my project! However for calculating Genetic Distance I see FTDNA's former composite model took account of recLOHs, and of course recLOHs are irrelevant in the Infinite Allele model they are now using in GAP 2.0. Right? James -----Original Message----- From: Diana Gale Matthiesen Sent: Friday, January 21, 2011 1:13 AM To: y-dna-projects@rootsweb.com Subject: Re: [Y-DNA-projects] Y-DNA-PROJECTS Intersting Situation - Thanks > From: James Irvine > Sent: Thursday, January 20, 2011 2:58 PM > > Sorry, much preoccupied with other things at present. Yes, > i believe TiP takes recLOHs into account (though I wouldn't > swear my memory is correct). Alas despite Diana's > relatively clear explanation I still couldn't recognize > a recLOH - I just don't Know if my project has any! I have an example of a recLOH that's pretty clear: http://dgmweb.net/DNA/Corbin/CorbinDNA-results-HgR1b-Sherman.html#data This is my CORBIN with an NPE who is really a SHERMAN. Most of the individuals in the table are the modal 19,23 at YCAII, but note the bottom two rows in the table (red table cells). These clearly represent a single recLOH in a common ancestor (as opposed to two independent recLOH events) as they both share another mutation at DYS481. Diana ______________________________ PLEASE trim the amount you backquote in your replies to the minimum, especially if you subscribe in DIGEST mode. ------------------------------- To unsubscribe from the list, please send an email to Y-DNA-PROJECTS-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
> From: James Irvine > Sent: Friday, January 21, 2011 4:28 AM > > Thanks, Diana - there at last, and I've found some lurking in my > project! > > However for calculating Genetic Distance I see FTDNA's former > composite model took account of recLOHs, and of course recLOHs > are irrelevant in the Infinite Allele model they are now using > in GAP 2.0. Right? I'm afraid I've never used the Genetic Distance calculator on my GAP, so I have no idea how it works. If I need a GD, I will calculate it myself. One of the biggest problems in paper genealogy is people making connections on too little evidence, to which I say: a bad connection is *not* better than no connection. I carry the same attitude into my DNA projects. If a match is good it's generally so obvious it hits you in the face, and I don't need a GD calculator to see it. If it isn't obvious, I don't call it a match because then you're making the same mistake novice genealogists make. It seems to me the urge to group as many project members as possible leads some project admins to group too aggressively. IMO, it's better to leave people unassigned than to group them incorrectly. If you like, I can give you an example of a project that groups overzealously, but in a private email because I'd rather not criticize a project on the list. [The admin is not a subscriber to this list, so y'all can relax.] The admin even brags on the project's home page that, "...in the [_____] project, 86% of the [______]s tested have at least one meaningful match. Truly an amazing percentage." And it would be, if it didn't actually mean he groups people too readily. It's a large project, and it appears the admin is using percent probabilities *alone* in his decisions to place people in groups -- with the bar set to low. I'll admit to having an advantage because I've standardized on 67 markers for my projects, and with 67 markers, it's *very* much easier to tell when you've got a match, or not. It is also the case that I do not admit a member without them supplying the paper genealogy of their patrilineal line. I may know, from the moment they purchase their kit, whom they should match. When their results return, they either will or won't match an existing group and, if not, they remained unassigned until someone new matches them. My other advantage is that I am not working on common surnames: three are rare and two are relatively uncommon. I know these surnames well as I've been studying them for years, and we have most of the major American progenitors tested. The major goal, at this point, is crossing the pond. Still, when it comes to using data on "my" families extracted from other, larger projects, I find myself using the same m.o. I use in my own projects. When it's a match, it's obvious, provided they've tested enough markers. Diana
Diana, Thanks for your comments. Your message raises three inter-related points I am taking up: 1. a match is obvious; if in doubt it isn't one; 2. making connections on too little evidence; a bad connection is worse than no connection; grouping too aggressively; boasting thereon 3. the major goal is crossing the pond. Your first point is no doubt usually true in small projects, but when one gets a project such as the Clan Irwin project with > 200 participants, with its largest single group of >100, I found it essential to adopt some explicit yardstick to avoid the "why him and not me" questions. So I had to opt for some arbitrary "rule". For better or worse I opted for 90% TiP (using, again arbitrarily, the highest available resolution, 24 generations, nil paper trail, with minimum of 2 participants per group (unless "already" closely related, or with a pre-1600 paper trail)). The result seems to work well, except that it unfortunately includes rather than excludes a few "12/12" participants (but even here this has helped to induce many of my former 12-marker participants to upgrade). This rule resulted in a match rate of 87% at the time of my paper "Towards improvements in y-DNA Surname Project Administration" in JOGG (www.jogg.info) Fall 2010 (Appendix A, line 36) (and unchanged since). The Dalton and Blair projects were nearly as high. I don't claim this as an "achievement", but suspect its being higher than many other surnames is more dependant on the type of surname and on the size of project than any "skill" or "manipulation" on my part. Even if you remove my "minimum of 2" and "unless" exceptions it is 82%. And with this more stringent rule and removing all results <37 markers it is still 87%. Even with a minimum of 37 markers and a simple criterion of 95% 24-generation, no-paper-trail TiP, 85% of Irwins participants could be matched/grouped! You may of course criticise my TiP of 90%, but in fact I chose TiPs as the most sophisticated available tool (whatever its bias may be), and chose the 90% criterion on exactly the same basis as your thinking: I think it is the most obvious, leaving aside the 12-marker results. If you examine the results table at www.clanirwin.org > DNA Study I will be interested if, on the basis of the above, you come to some other conclusion. Aside from low resolution results aside (and perhaps some of my very minor exceptions), I don't think this bar is "too low". Your other criticism will be my using probabilities alone, and not taking into account paper trails (except to the limited extent outlined above). Here I suspect you have a stronger point, for of the 87% of my participants resident in the "new world", all but one of those resident in USA are unable to trace a paper trail back to the British Isles (the relatively few Australasian Irwins, who generally migrated later, are more fortunate). And as, I suspect, most had ancestors who migrated from the Scottish Borders to Ulster in the 17th century and then from Ulster to USA in the 18th century, I fear they never will be able to complete a paper trail back to their geographic origins. But despite the instincts of Chris Pomeroy and his like, I do not feel that lack of a paper trail should prevent grouping. Another factor relevant here is the questionable reliability of many paper trails: I am sure you find a fair number include dubious assumptions of relationships between like-named individuals, what I call “IGI pedigrees”. DNA of course can “see through” some, but not all, of these false assumptions. So turning to the "crossing the pond" dimension, of the 21 groups I have now identified in the Clan Irwin project, 4 presently include no "new world" migrants, but I have been able to "cross the pond" for 15 of the remaining 17. The "pond" here includes the Pacific as well as the Atlantic. With the single USA and a few Australian exceptions, none of the participants in these groups have a completed paper trail back to the 16th century, but having worked with the surname for half a century I have, like you, a pretty intimate knowledge of the name and its various branches. Perhaps I still do not appreciate how lucky I am to be working with this particular name. Certainly I have been able to please the high proportion of participants for whom I have been able to use DNA to identify their geographic origins in the middle ages, and I have no doubt this "success rate" has been an important contribution to the growth of our project. Perhaps my greatest sin is my suggesting my methods are applicable to other projects. However in mitigation I simply observe that (a) I have the benefit of a much larger data base to work from than many others, with its accompanying constraints (e.g. an explicit rule was needed) and opportunities (e.g. picking an "obvious" cut off level), and (b) it is only a suggestion: if one or more of my ideas are no help, then they can be discarded! All this said, I do warmly concur with your statements I summarise in point 2. above, and crave your indulgence if I occasionally exhibit the human trait of erring a little! I make these points not in boasting, but hopefully to help widen readers' perspectives of what some yDNA studies can achieve. James. -----Original Message----- From: Diana Gale Matthiesen Sent: Friday, January 21, 2011 11:52 AM To: y-dna-projects@rootsweb.com Subject: Re: [Y-DNA-projects] Y-DNA-PROJECTS Intersting Situation - Thanks > From: James Irvine > Sent: Friday, January 21, 2011 4:28 AM > > Thanks, Diana - there at last, and I've found some lurking in my > project! > > However for calculating Genetic Distance I see FTDNA's former > composite model took account of recLOHs, and of course recLOHs > are irrelevant in the Infinite Allele model they are now using > in GAP 2.0. Right? I'm afraid I've never used the Genetic Distance calculator on my GAP, so I have no idea how it works. If I need a GD, I will calculate it myself. One of the biggest problems in paper genealogy is people making connections on too little evidence, to which I say: a bad connection is *not* better than no connection. I carry the same attitude into my DNA projects. If a match is good it's generally so obvious it hits you in the face, and I don't need a GD calculator to see it. If it isn't obvious, I don't call it a match because then you're making the same mistake novice genealogists make. It seems to me the urge to group as many project members as possible leads some project admins to group too aggressively. IMO, it's better to leave people unassigned than to group them incorrectly. If you like, I can give you an example of a project that groups overzealously, but in a private email because I'd rather not criticize a project on the list. [The admin is not a subscriber to this list, so y'all can relax.] The admin even brags on the project's home page that, "...in the [_____] project, 86% of the [______]s tested have at least one meaningful match. Truly an amazing percentage." And it would be, if it didn't actually mean he groups people too readily. It's a large project, and it appears the admin is using percent probabilities *alone* in his decisions to place people in groups -- with the bar set to low. I'll admit to having an advantage because I've standardized on 67 markers for my projects, and with 67 markers, it's *very* much easier to tell when you've got a match, or not. It is also the case that I do not admit a member without them supplying the paper genealogy of their patrilineal line. I may know, from the moment they purchase their kit, whom they should match. When their results return, they either will or won't match an existing group and, if not, they remained unassigned until someone new matches them. My other advantage is that I am not working on common surnames: three are rare and two are relatively uncommon. I know these surnames well as I've been studying them for years, and we have most of the major American progenitors tested. The major goal, at this point, is crossing the pond. Still, when it comes to using data on "my" families extracted from other, larger projects, I find myself using the same m.o. I use in my own projects. When it's a match, it's obvious, provided they've tested enough markers. Diana ______________________________ PLEASE trim the amount you backquote in your replies to the minimum, especially if you subscribe in DIGEST mode. ------------------------------- To unsubscribe from the list, please send an email to Y-DNA-PROJECTS-request@rootsweb.com with the word 'unsubscribe' without the quotes in the subject and the body of the message
James For RecLohs you need to look at the multi-copy markers and especially DYS 464 and CDY a and b. You will typically see matching pairs, eg, 16-16, 39-39. If there's a RecLoh on DYS 464 you will have four identical values (eg, 15-15-15-15). A number of the multi-copy markers can be affected in a single RecLoh event. Presumably as mutations happen entirely at random not all twin values are the result of RecLohs and some values are identical purely by chance. I'm not sure what FTDNA are doing with the genetic distance reports now. I prefer their old system. With my two Aldous men where one has a RecLoh I can't even produce a genetic distance report any more. I get a message saying "no data", presumably because the genetic distance is huge as a result of the RecLoh. I am going to send FTDNA some feedback. Debbie