Dave, (and the masses), I've got it, by George, I think I've got it!! So, does this mean you are now in a good position to extract those double keyed records that sort of match up, in as much that they are probably the same record but with differences, and to put them into the stage where someone decides the real data and arbitrates?? If so, that would reduce the number of unique records closer to the number of distinct records - yes?? So, George, have I got it?? John Fairlie Mail us at ..... john@fairlie.plus.com john.fairlie@blueyonder.co.uk Home page... http://www.fairlie.plus.com -----Original Message----- From: Dave Mayall [mailto:dave@research-group.co.uk] Sent: Tuesday, February 03, 2004 8:35 AM To: john.fairlie@blueyonder.co.uk; FREEBMD-DISCUSS-L@rootsweb.com Subject: Re: Latest update ----- Original Message ----- From: "John Fairlie" <john.fairlie@blueyonder.co.uk> To: <FREEBMD-DISCUSS-L@rootsweb.com> Sent: Monday, February 02, 2004 5:50 PM Subject: RE: Latest update > OK, I give up. Please explain "distinct" records as opposed to "Unique" > records. :-) We implemented a solution to solve the overcounting that you identified! Consider a page of 40 entries, double keyed, with 3 entries transcribed differently by the transcribers. That would be 80 total records, it would also be 43 unique records, giving an overcount of 3 records to the total, and messing the stats up. We now analyse the alignment of unmatched records, and do an additional count on records which don't actually match, but which (because of their sequence) are obviously different transcriptions of the same entry, and in the distinct records count, onlyu count them once, thus there would be 40 distinct records. This achieves two things; 1) More accurate stats 2) Data that tells us about the degree of mismatch between double keyings (the difference between Unique and distinct is the number of mismatches)