There may not be an easy answer, but I believe there has to be AN answer, as confusion rarely clears itself, and the subject will have to be addressed sooner or later. There seems to be three types of upload to FreeBMD, blocks of first keyings, blocks of second keyings, and random/ad hoc entries. Ignoring the latter, why can't files uploaded be marked as either first keying or second keying, in the same way as they are marked as births, deaths or marriages? Surely the syndicate leader must know what he/she has given his/her members, and as stated on the list before, transcribers must be told whether they are doing first or second keying anyway. We would obviously have to go back and classify all the files already uploaded, but this should be easy for syndicate leaders, and would be well worth the effort involved. When the database is compiled, it will then be known what all the first keying submissions are. Second keying files (and random/ad hoc) can then be compared to the first keying and if the records within them are identical, a bold entry will result on the search output screen. A second keying entry that does not match a first keying entry could be written to a file and referred to an arbitrator. I believe that the FreeBMD system documented on the web site already proposes arbitrators, and that an arbitrators upload overrules all other uploads. Considering the obvious skill of the FreeBMD programmers, I would not have thought this too difficult. John Fairlie Mail us at ..... john@fairlie.plus.com john.fairlie@blueyonder.co.uk Home page... http://www.fairlie.plus.com -----Original Message----- From: Dave Mayall [mailto:david.mayall@ukonline.co.uk] Sent: Monday, December 01, 2003 11:58 AM To: FREEBMD-DISCUSS-L@rootsweb.com Subject: Re: Latest Update On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: >The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed. > >We are told that there are 990,848 unique records of births in 1898. For that year, there were 2,474 pages of births to be transcribed. Assuming an average of 374 births per page (and an assumption of 375 or 376 would not affect the issue), there were 925,276 births in that year. So the number of "unique" records exceeds the actual number of births by about 65,000, ie about 7 percent. > >To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. > >One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. The question of the statistics has been raised previously, and there is no easy answer! 1890 Marriages figures from ONS show that there are 223,000 Marriages (446,000 entries). These are the figures we base our completeness on. We do know however that there are various factors which tend to make the actual total to be transcribed higher. The overrun seems to be rather excessive, and is worthy of investigation, and I will do so. -- Dave Mayall ============================== To join Ancestry.com and access our 1.2 billion online genealogy records, go to: http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237
On Mon, 1 Dec 2003 17:35:12 -0000, you wrote: >There may not be an easy answer, but I believe there has to be AN answer, as >confusion rarely clears itself, and the subject will have to be addressed >sooner or later. > >There seems to be three types of upload to FreeBMD, blocks of first keyings, >blocks of second keyings, and random/ad hoc entries. Ignoring the latter, >why can't files uploaded be marked as either first keying or second keying, >in the same way as they are marked as births, deaths or marriages? Surely >the syndicate leader must know what he/she has given his/her members, and as >stated on the list before, transcribers must be told whether they are doing >first or second keying anyway. > >We would obviously have to go back and classify all the files already >uploaded, but this should be easy for syndicate leaders, and would be well >worth the effort involved. I'm sorry, but I have to disagree. Why do we need to identify which files are first and which are second keying? For the purpose of matching, it suffices to identify that 2 files are different keyings of the same page. It matters not which we regard as first and which we regard as second in the subsequent matching process, so backloading this information onto over a quarter of a million files is a whole load of effort to no purpose. >When the database is compiled, it will then be known what all the first >keying submissions are. Second keying files (and random/ad hoc) can then be >compared to the first keying and if the records within them are identical, a >bold entry will result on the search output screen. A second keying entry >that does not match a first keying entry >could be written to a file and referred to an arbitrator. I believe that >the FreeBMD system documented on the web site already proposes arbitrators, >and that an arbitrators upload overrules all other uploads. Yes, that is part of what will happen. However, the problem you have tried to fix isn't a problem in need of fixing. The main problem in performing the match is aligning the transcriptions so that we know that they need to be compared with each other. This involves taking each accession for a year (a file may contain a number of accessions). Aligning works out that 2 (or more) accessions are sufficiently similar that they represent the same section of index and need to be compared. >Considering the obvious skill of the FreeBMD programmers, I would not have >thought this too difficult. Of course we can do it, given sufficient time. However, at present FreeBMD relies on volunteer programmers doing short stints in their spare time. That means things happen slowly, and that things that are going to stop the service working get priority. If someone can think of a way to pay our programmers the going rate, that might change. -- Dave Mayall