RootsWeb.com Mailing Lists
Total: 3/3
    1. Latest Update
    2. John Parker
    3. The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. Take two event years which have apparently been fully transcribed. We are told that there are 990,848 unique records of births in 1898. For that year, there were 2,474 pages of births to be transcribed. Assuming an average of 374 births per page (and an assumption of 375 or 376 would not affect the issue), there were 925,276 births in that year. So the number of "unique" records exceeds the actual number of births by about 65,000, ie about 7 percent. To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. I have previously raised this problem with Peter Dauncey off list, so he will have been warned. J S Parker

    11/30/2003 12:23:53
    1. Re: Latest Update
    2. Dave Mayall
    3. On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: >The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed. > >We are told that there are 990,848 unique records of births in 1898. For that year, there were 2,474 pages of births to be transcribed. Assuming an average of 374 births per page (and an assumption of 375 or 376 would not affect the issue), there were 925,276 births in that year. So the number of "unique" records exceeds the actual number of births by about 65,000, ie about 7 percent. > >To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. > >One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. The question of the statistics has been raised previously, and there is no easy answer! 1890 Marriages figures from ONS show that there are 223,000 Marriages (446,000 entries). These are the figures we base our completeness on. We do know however that there are various factors which tend to make the actual total to be transcribed higher. The overrun seems to be rather excessive, and is worthy of investigation, and I will do so. -- Dave Mayall

    12/01/2003 04:57:39
    1. Re: Latest Update
    2. Martin Cope
    3. > From: "John Parker" <johns@parkerj46.fsnet.co.uk> > The latest update shows a gratifying increase in the number of unique > records. The figures do, however, raise some questions. > > [snip] > > One possible cause of the discrepancy is inconsistencies in the keying of > individual index entries. I think that we have been previously told that, > where a page has been double keyed, different transcriptions of the same > record will show up as two unique records in the update statistics. If this > is the full explanation, then it raises some disturbing questions about the > accuracy of our transcription. In any event, the question does arise of > whether the statistics give a slightly too rosy account of progress. > You're right. Mismatched double keyed entries count as two unique entries - the system can't distinguish them from single keyed entries. If you look back in the list archives you'll find previous discussion on this and see that there's little prospect of any improvement in the accuracy of these statistics. Maybe some day some text will be added to the statistics web pages to explain that they are a potentially useful guide but their accuracy is not determinable. Martin Cope

    12/01/2003 06:17:14