When a searcher fails to find an expected record in FreeBMD, I suspect that the graphs on the Progress Page - http://freebmd.rootsweb.com/progress.shtml - are frequently used to judge whether or not that record might still be waiting to be transcribed. In order to enable such judgements to be more accurate, I'd like to suggest the following enhancements: 1. Make the graphs continue above the 100% line - maybe to 110% or so - this would make excessive overruns such as in the 1890 marriages (see below) show up more obviously. 2. Include - ideally on the same graphs, perhaps in a darker shade of the same colour - the proportion of records double-keyed. This would be the "official" count, ie those entries where the two keyings are sufficiently similar that the system recognises them as the same entry. Since there are now nearly 10 million of these, and they are a crucial part of the overall project plan, it would be good to be able to see at a glance where they are. 3. Include a footnote explaining how overruns arise - in particular pointing out that 100% on the graph could mean just that, but that it could also mean eg. 5% completely untranscribed and 5% inconsistently double-keyed. Combining this knowledge with the information about the proportion of double-keyings, the searcher could then make his/her own judgement about the likelihood of a particular record still remaining untranscribed. I would hope that these enhancements would be relatively straightforward to implement, and I believe that they would make the Progress Page into a more reliable indicator of progress, both for users and organisers of FreeBMD. Andrew Gough ************************************************************ Date: Mon, 01 Dec 2003 11:57:39 +0000 From: Dave Mayall <david.mayall@ukonline.co.uk> To: FREEBMD-DISCUSS-L@rootsweb.com Message-ID: <km9msvs2cr8buo7i5oqk7mv20bauo268un@smtp.ukonline.co.uk> Subject: Re: Latest Update On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: >The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed....> > >To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. > >One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. The question of the statistics has been raised previously, and there is no easy answer! 1890 Marriages figures from ONS show that there are 223,000 Marriages (446,000 entries). These are the figures we base our completeness on. We do know however that there are various factors which tend to make the actual total to be transcribed higher. The overrun seems to be rather excessive, and is worthy of investigation, and I will do so. -- Dave Mayall ___________________________________________________________________________________________________ Visit the web site of the Financial Times at http://www.ft.com