RootsWeb.com Mailing Lists
Previous Page      Next Page
Total: 1840/4024
    1. Oh how fast one forgets
    2. The Minarzicks
    3. I am having a bit of trouble. Have not done much typing for quite a few months and now I am having trouble uploading to FreeBMD. When I finish my typing I press "send to FreeBMD" and supposedly the file uploads successfully but I see that the tiff is still in my "to do" list from Derek Hopkins. Can someone please tell me what I am doing wrong. Lynda

    12/06/2003 10:34:08
    1. Re: NEW SCANS
    2. Allan Raymond
    3. Scans from Derek were done by a separate organisation in Canada. Archive CD Books are based in England which means I can deliver the films to them for scanning as soon as they arrive with me. Improvement in quality of scans depends to a great extent on the source films/fiche. Yes, I have visited Archive CD Books with Dave Mayall and they are very professional, they do scanning for a living and we are pleased with the quality offered by them. Poor quality scans in the past most likely was due to bad source rather than the organisation that actually did the scanning on behalf of FreeBMD. My initials thoughts are that we have a "News Flash" Web Page, which can be updated on demand. Allan Raymond -----Original Message----- From: L Hambling <l.hambling@ntlworld.com> To: FREEBMD-DISCUSS-L@rootsweb.com <FREEBMD-DISCUSS-L@rootsweb.com> Date: 06 December 2003 16:58 Subject: NEW SCANS >Thank you Allan a few questions if you don't mind. > >If some of the scans have already been uploaded when do you think we will start on a more logical order of transcribing and what will that order be? > >You say *Derek Hopkins has also uploaded more scans within the last couple of days >for Births 1886, 1887 and 1888* are these scans from Archive CD Books? > >When can transcribers expect to see an improvement in the quality of the scans? > >I am assuming that the Archive CD Books scans are really good and you have seen them, so are they a big improvement on some we have had so far? > >A regular bulletin on the progress would be appreciated, thanks. > >Lucille > > >============================== >To join Ancestry.com and access our 1.2 billion online genealogy records, go to: >http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237 >

    12/06/2003 10:27:25
    1. NEW SCANS
    2. L Hambling
    3. Thank you Allan a few questions if you don't mind. If some of the scans have already been uploaded when do you think we will start on a more logical order of transcribing and what will that order be? You say *Derek Hopkins has also uploaded more scans within the last couple of days for Births 1886, 1887 and 1888* are these scans from Archive CD Books? When can transcribers expect to see an improvement in the quality of the scans? I am assuming that the Archive CD Books scans are really good and you have seen them, so are they a big improvement on some we have had so far? A regular bulletin on the progress would be appreciated, thanks. Lucille

    12/06/2003 09:58:47
    1. Re: NEW SCANS
    2. Allan Raymond
    3. Films for all the missing years/events 1904 to 1910 were ordered and are now with Archive CD Books to scan. Some of the scans from this order have already been uploaded to the FreeBMD site. A further order for Marriages 1843, 1845, 1847, 1857, 1861, 1862 & 1863 is with the ONS with expected delivery Mid Feb 2004. Derek Hopkins has also uploaded more scans within the last couple of days for Births 1886, 1887 and 1888. Although it looks as if our ordering process is somewhat hotchpotch, we deliberately ordered the films for 1904 to 1910 years to provide sufficient source for Syndicates. Films for Marriages 1843, 1845, 1847, 1857, 1861, 1862 & 1863 were ordered to fill in some of the missing gaps in our scans. We will systematically order films for any years (1837 to 1910) which are currently devoid of scans and then order new films for those event/periods which have poor quality scans on the FreeBMD site. Dave Mayall is in the best situation to answer progress on our new hardware. I see no reason why we cant produce a regular bulletin to report progress. Allan Raymond -----Original Message----- From: L Hambling <l.hambling@ntlworld.com> To: FREEBMD-DISCUSS-L@rootsweb.com <FREEBMD-DISCUSS-L@rootsweb.com> Date: 06 December 2003 13:29 Subject: NEW SCANS >Below is just an extract of a message from Dave Mayall - Discuss List - 11/09/2003. > >>>We are embarking on a project *NOW* to complete scanning in a logical order. We can do this now because we are in a sufficiently stable financial position to do so. The plan is; >1) Install additional hardware to handle the extra scans > >2) Complete scanning of all years all events from 1910 to 1866 working backwards<< > >>>Time-scales are always difficult, but current guess is new hardware on-line by end October 03, Source 1866-1910 on-hand by same time > >1866-1910 scans available gradually from November to January 04 > >1837-1856 marriage source on hand by early January 04 > >1837-1856 scans available February - June 2004 > >Beyond that time-scales are too vague to be useful. << > >I know there has been a problem with the hard ware and wondered if someone might give us an update into how things are progressing? Also are there any plans to post messages from time to time of the progress? It would be nice as I am sure there are others like me who are interested. > > > >Lucille > >Scan2 > > > >============================== >To join Ancestry.com and access our 1.2 billion online genealogy records, go to: >http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237 >

    12/06/2003 08:31:33
    1. RE: NEW SCANS
    2. John Fairlie
    3. Yes, and wasn't there once a mailing list called "News". In fact it is still references on the FBMD web site, but seems to have had no postings since God was a boy. Any news on "News"?? John Fairlie Mail us at ..... john@fairlie.plus.com john.fairlie@blueyonder.co.uk Home page... http://www.fairlie.plus.com -----Original Message----- From: L Hambling [mailto:l.hambling@ntlworld.com] Sent: Saturday, December 06, 2003 1:31 PM To: FREEBMD-DISCUSS-L@rootsweb.com Subject: NEW SCANS Below is just an extract of a message from Dave Mayall - Discuss List - 11/09/2003. >>We are embarking on a project *NOW* to complete scanning in a logical order. We can do this now because we are in a sufficiently stable financial position to do so. The plan is; 1) Install additional hardware to handle the extra scans 2) Complete scanning of all years all events from 1910 to 1866 working backwards<< >>Time-scales are always difficult, but current guess is new hardware on-line by end October 03, Source 1866-1910 on-hand by same time 1866-1910 scans available gradually from November to January 04 1837-1856 marriage source on hand by early January 04 1837-1856 scans available February - June 2004 Beyond that time-scales are too vague to be useful. << I know there has been a problem with the hard ware and wondered if someone might give us an update into how things are progressing? Also are there any plans to post messages from time to time of the progress? It would be nice as I am sure there are others like me who are interested. Lucille Scan2 ============================== To join Ancestry.com and access our 1.2 billion online genealogy records, go to: http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237

    12/06/2003 07:05:28
    1. NEW SCANS
    2. L Hambling
    3. Below is just an extract of a message from Dave Mayall - Discuss List - 11/09/2003. >>We are embarking on a project *NOW* to complete scanning in a logical order. We can do this now because we are in a sufficiently stable financial position to do so. The plan is; 1) Install additional hardware to handle the extra scans 2) Complete scanning of all years all events from 1910 to 1866 working backwards<< >>Time-scales are always difficult, but current guess is new hardware on-line by end October 03, Source 1866-1910 on-hand by same time 1866-1910 scans available gradually from November to January 04 1837-1856 marriage source on hand by early January 04 1837-1856 scans available February - June 2004 Beyond that time-scales are too vague to be useful. << I know there has been a problem with the hard ware and wondered if someone might give us an update into how things are progressing? Also are there any plans to post messages from time to time of the progress? It would be nice as I am sure there are others like me who are interested. Lucille Scan2

    12/06/2003 06:30:59
    1. Re: Latest Update
    2. Dave Mayall
    3. On Wed, 3 Dec 2003 19:50:27 -0000, you wrote: >Barrie, > >I think you have succinctly defined the problem. Dave Mayall also said that >the main problem in performing the match is aligning the transcriptions so >that we know that they need to be compared with each other. > >So in your example below, what we need to do is compare Pb1898_20 with >1898B40020. So how does a dumb computer know this?? I wouldn't know that >these two files are the same page so I would have to do some pretty clever >coding to get a computer to work it out. Even then, the filename for >Pb1898_20 does not infer the fourth quarter. This must be within the file >header. Then the computer has to read the page number from within the file, >get 01 for file Pb1898_20, realise it's that that is wrong and not the 20 in >the file name, and replace it with the correct page number. To be frank, I >cannot see all this being programmed up to happen automatically as filenames >and headers are so inconsistent at present. If it can, I would be delighted >to be proved wrong. Well, be delighted! We use a whole load of data crunching to achieve it, including a chunk of code which we obtained from the human genome project. >Without this information, the system can only open and collate the contents >of both files. Obviously the two transcribers made differences, both as >mistakes and as uncertain characters. So if each file was meant to contain >375 entries, we may end up with 400 unique entries and 350 matching entries. >This is what leads to years being apparently over 100% complete!! The 50 >records that now don't match are just 50 records, no-one knows that 25 of >them are non matching duplicates of the other 25. That is very likely where some of the count discrepancies originate, and it is where we are concentrating our efforts. Of course, things are complicated by the fact that a file is NOT the basic unit of work internally, because a file may contain more than one page. >What we have to do is get back to that page being 375 entries and only 375 >entries. Hence my suggestion that one file (and I say again that it matters >not which), is considered as first key and we adopt it's 375 entries. The >second file may match 350 of them. OK - fine. But the other 25 that don't >match ARE NOT NEW UNIQUE RECORDS!!! They are candidates for arbitration. Yes, we need to correct the counting of these records, and I think we can do so. It will just take a while to achieve! >Hence my approach is that we need software that looks at all first keyings >and identifies all entries that are out of sequence, have uncertain >characters etc, pages out of range for the district, and entries where there >are too many entries per register page, and clean them. (Yes, I know this >is not the defined FreeBMD process.) Then we need a strict file naming >regime that allows second keyings to be seen as such, and not as a duplicate >first key files under another file name. Cleaning is a part of the process! However, we still don't need the filenaming regime or identifying first/second keying files, because we can already match files and identify which have identical content. I believe that we should also be able to Identify the counts better. At present, we are counting two things for each chunk of data; 1) Total records into the chunk 2) Total distinct records out of the chunk (1-duplicates found) What we need to be finding is length of the aligned data. I suspect that a diagram of what we are and would be counting would be useful, and I'll have a go at one. >A quality end product means having a quality process that picks up errors >(file naming or header/page number errors) early, and ensures all stick to >the defined system. Then the software can work well to produce a clean >database where 100% complete means 100% complete and the actual records have >a quantifiable accuracy about them! A quality end product is fault tolerant of things such as file naming. We have a perfectly serviceable way of doing the data alignment that doesn't rely on file names, so we stick with that. As it happens the system is far more sensitive to missing +PAGE lines than anything else. This isn't going to be a quick fix, but I believe a fix is possible if Barrie and I can battle our way through the innards of some rather complex code. Leave it with us. -- Dave Mayall

    12/04/2003 12:31:58
    1. RE: Latest Update
    2. John Fairlie
    3. Barrie, I think you have succinctly defined the problem. Dave Mayall also said that the main problem in performing the match is aligning the transcriptions so that we know that they need to be compared with each other. So in your example below, what we need to do is compare Pb1898_20 with 1898B40020. So how does a dumb computer know this?? I wouldn't know that these two files are the same page so I would have to do some pretty clever coding to get a computer to work it out. Even then, the filename for Pb1898_20 does not infer the fourth quarter. This must be within the file header. Then the computer has to read the page number from within the file, get 01 for file Pb1898_20, realise it's that that is wrong and not the 20 in the file name, and replace it with the correct page number. To be frank, I cannot see all this being programmed up to happen automatically as filenames and headers are so inconsistent at present. If it can, I would be delighted to be proved wrong. Consider that the files were 1898B40020 first key and 1898B40020 second key with otherwise identical headers. (Actually the first and second would be part of the header within the top of the file.) Of course it is irrelevant which is first key and which is second key - the important thing is that we have identified two files that are meant to contain identical contents. This is the important prelude to the actual comparison process, not to mention the database build process. Without this information, the system can only open and collate the contents of both files. Obviously the two transcribers made differences, both as mistakes and as uncertain characters. So if each file was meant to contain 375 entries, we may end up with 400 unique entries and 350 matching entries. This is what leads to years being apparently over 100% complete!! The 50 records that now don't match are just 50 records, no-one knows that 25 of them are non matching duplicates of the other 25. What we have to do is get back to that page being 375 entries and only 375 entries. Hence my suggestion that one file (and I say again that it matters not which), is considered as first key and we adopt it's 375 entries. The second file may match 350 of them. OK - fine. But the other 25 that don't match ARE NOT NEW UNIQUE RECORDS!!! They are candidates for arbitration. Hence my approach is that we need software that looks at all first keyings and identifies all entries that are out of sequence, have uncertain characters etc, pages out of range for the district, and entries where there are too many entries per register page, and clean them. (Yes, I know this is not the defined FreeBMD process.) Then we need a strict file naming regime that allows second keyings to be seen as such, and not as a duplicate first key files under another file name. Then we have a system that really is programmable. Then and only then can comparison software produce a result that is meaningful, and a compiled master database that is not apparently over 100% complete when it is actually under 100% complete. A quality end product means having a quality process that picks up errors (file naming or header/page number errors) early, and ensures all stick to the defined system. Then the software can work well to produce a clean database where 100% complete means 100% complete and the actual records have a quantifiable accuracy about them! Phew!! John Fairlie Mail us at ..... john@fairlie.plus.com john.fairlie@blueyonder.co.uk Home page... http://www.fairlie.plus.com -----Original Message----- From: Archer Barrie [mailto:Barrie.Archer@services.fujitsu.com] Sent: Tuesday, December 02, 2003 10:28 AM To: 'John Fairlie' Cc: FREEBMD-DISCUSS-L@rootsweb.com Subject: RE: Latest Update John, I am not sure why it is important to know which is the first keying. Surely if there are two or more keyings that produce different results all that is necessary is to know they are different. Which came first seems of little relevance. Your suggestion for comparing the entries omits the difficulty of knowing which two files to compare. If everyone had followed the filenaming standard this might have been possible but as I recall it only about 60% do. For example in 1898B4 one has to compare file Pb1898_20 with file 1898B40020. Page numbers from +PAGE are the other alternative but these are often not entered or accurate, for example Pb1898_20 contains page number 1 (wrong) and 1898B40020 contains page number 20 (right). Sorting this out would be a huge amount of work. And of course neither method works for random entries. What we actually do is compare the entries ignoring the the file that they have come from, thus enabling us to take into account random entries. Barrie > -----Original Message----- > From: John Fairlie [mailto:john.fairlie@blueyonder.co.uk] > Sent: 01 December 2003 17:35 > To: FREEBMD-DISCUSS-L@rootsweb.com > Subject: RE: Latest Update > > > There may not be an easy answer, but I believe there has to > be AN answer, as > confusion rarely clears itself, and the subject will have to > be addressed > sooner or later. > > There seems to be three types of upload to FreeBMD, blocks of > first keyings, > blocks of second keyings, and random/ad hoc entries. > Ignoring the latter, > why can't files uploaded be marked as either first keying or > second keying, > in the same way as they are marked as births, deaths or > marriages? Surely > the syndicate leader must know what he/she has given his/her > members, and as > stated on the list before, transcribers must be told whether > they are doing > first or second keying anyway. > > We would obviously have to go back and classify all the files already > uploaded, but this should be easy for syndicate leaders, and > would be well > worth the effort involved. > > When the database is compiled, it will then be known what all > the first > keying submissions are. Second keying files (and random/ad > hoc) can then be > compared to the first keying and if the records within them > are identical, a > bold entry will result on the search output screen. A second > keying entry > that does not match a first keying entry > could be written to a file and referred to an arbitrator. I > believe that > the FreeBMD system documented on the web site already > proposes arbitrators, > and that an arbitrators upload overrules all other uploads. > > Considering the obvious skill of the FreeBMD programmers, I > would not have > thought this too difficult. > > John Fairlie > Mail us at ..... john@fairlie.plus.com > john.fairlie@blueyonder.co.uk > Home page... http://www.fairlie.plus.com > > > -----Original Message----- > From: Dave Mayall [mailto:david.mayall@ukonline.co.uk] > Sent: Monday, December 01, 2003 11:58 AM > To: FREEBMD-DISCUSS-L@rootsweb.com > Subject: Re: Latest Update > > > On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: > > >The latest update shows a gratifying increase in the number of unique > records. The figures do, however, raise some questions. > > > >Take two event years which have apparently been fully transcribed. > > > >We are told that there are 990,848 unique records of births > in 1898. For > that year, there were 2,474 pages of births to be > transcribed. Assuming an > average of 374 births per page (and an assumption of 375 or > 376 would not > affect the issue), there were 925,276 births in that year. > So the number of > "unique" records exceeds the actual number of births by about > 65,000, ie > about 7 percent. > > > >To take another example, we are told that there are 514, 581 > unique records > of marriages in 1890. In that year, there were 1,206 pages > of marriages to > be transcribed. At 374 entries per page, there were 451,044 marriage > entries in all. (Each marriage of course generates two entries in the > index.) So the number of "unique" records exceeds the actual > number of > marriage entries by over 63,000, ie about 14 percent. > > > >One possible cause of the discrepancy is inconsistencies in > the keying of > individual index entries. I think that we have been > previously told that, > where a page has been double keyed, different transcriptions > of the same > record will show up as two unique records in the update > statistics. If this > is the full explanation, then it raises some disturbing > questions about the > accuracy of our transcription. In any event, the question > does arise of > whether the statistics give a slightly too rosy account of progress. > > The question of the statistics has been raised previously, and there > is no easy answer! > > 1890 Marriages figures from ONS show that there are 223,000 Marriages > (446,000 entries). These are the figures we base our completeness on. > We do know however that there are various factors which tend to make > the actual total to be transcribed higher. > > The overrun seems to be rather excessive, and is worthy of > investigation, and I will do so. > > -- > Dave Mayall > > > ============================== > To join Ancestry.com and access our 1.2 billion online > genealogy records, go > to: > http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237 >

    12/03/2003 12:50:27
    1. Suggestion: enhancements to Graphs
    2. When a searcher fails to find an expected record in FreeBMD, I suspect that the graphs on the Progress Page - http://freebmd.rootsweb.com/progress.shtml - are frequently used to judge whether or not that record might still be waiting to be transcribed. In order to enable such judgements to be more accurate, I'd like to suggest the following enhancements: 1. Make the graphs continue above the 100% line - maybe to 110% or so - this would make excessive overruns such as in the 1890 marriages (see below) show up more obviously. 2. Include - ideally on the same graphs, perhaps in a darker shade of the same colour - the proportion of records double-keyed. This would be the "official" count, ie those entries where the two keyings are sufficiently similar that the system recognises them as the same entry. Since there are now nearly 10 million of these, and they are a crucial part of the overall project plan, it would be good to be able to see at a glance where they are. 3. Include a footnote explaining how overruns arise - in particular pointing out that 100% on the graph could mean just that, but that it could also mean eg. 5% completely untranscribed and 5% inconsistently double-keyed. Combining this knowledge with the information about the proportion of double-keyings, the searcher could then make his/her own judgement about the likelihood of a particular record still remaining untranscribed. I would hope that these enhancements would be relatively straightforward to implement, and I believe that they would make the Progress Page into a more reliable indicator of progress, both for users and organisers of FreeBMD. Andrew Gough ************************************************************ Date: Mon, 01 Dec 2003 11:57:39 +0000 From: Dave Mayall <david.mayall@ukonline.co.uk> To: FREEBMD-DISCUSS-L@rootsweb.com Message-ID: <km9msvs2cr8buo7i5oqk7mv20bauo268un@smtp.ukonline.co.uk> Subject: Re: Latest Update On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: >The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed....> > >To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. > >One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. The question of the statistics has been raised previously, and there is no easy answer! 1890 Marriages figures from ONS show that there are 223,000 Marriages (446,000 entries). These are the figures we base our completeness on. We do know however that there are various factors which tend to make the actual total to be transcribed higher. The overrun seems to be rather excessive, and is worthy of investigation, and I will do so. -- Dave Mayall ___________________________________________________________________________________________________ Visit the web site of the Financial Times at http://www.ft.com

    12/03/2003 07:31:37
    1. Re: Second Keying
    2. Dave Mayall
    3. On Tue, 2 Dec 2003 21:42:35 EST, you wrote: >In a message dated 12/2/03 7:01:07 PM US Mountain Standard Time, >FREEBMD-DISCUSS-D-request@rootsweb.com writes: > >> I am not sure why it is important to know which is the first keying. Surely >> if there are two or more keyings that produce different results all that is >> necessary is to know they are different. Which came first seems of little >> relevance. >> >> > Amen! I have been involved in second keyings and, in the end, who knows >or cares which keying came first? We're all dedicated to this FreeBMD >project...and second keying is part of the project. For those transcribers who >haven't yet had the opportunity to 'do' second keying, you're missing the chance to >compare YOUR transcriptions to those of a fellow transcriber. Believe me, >it's a learning experience! It is also something which you must NOT do. If you compare your transcription to the other transcription, it is no longer a valid 2nd transcription, and will have to be done again. -- Dave Mayall

    12/03/2003 12:20:06
    1. Re: Second Keying
    2. In a message dated 12/2/03 7:01:07 PM US Mountain Standard Time, FREEBMD-DISCUSS-D-request@rootsweb.com writes: > I am not sure why it is important to know which is the first keying. Surely > if there are two or more keyings that produce different results all that is > necessary is to know they are different. Which came first seems of little > relevance. > > Amen! I have been involved in second keyings and, in the end, who knows or cares which keying came first? We're all dedicated to this FreeBMD project...and second keying is part of the project. For those transcribers who haven't yet had the opportunity to 'do' second keying, you're missing the chance to compare YOUR transcriptions to those of a fellow transcriber. Believe me, it's a learning experience! Cheers, Joane in Arizona, USA

    12/02/2003 02:42:35
    1. RE: Latest Update
    2. Archer Barrie
    3. John, I am not sure why it is important to know which is the first keying. Surely if there are two or more keyings that produce different results all that is necessary is to know they are different. Which came first seems of little relevance. Your suggestion for comparing the entries omits the difficulty of knowing which two files to compare. If everyone had followed the filenaming standard this might have been possible but as I recall it only about 60% do. For example in 1898B4 one has to compare file Pb1898_20 with file 1898B40020. Page numbers from +PAGE are the other alternative but these are often not entered or accurate, for example Pb1898_20 contains page number 1 (wrong) and 1898B40020 contains page number 20 (right). Sorting this out would be a huge amount of work. And of course neither method works for random entries. What we actually do is compare the entries ignoring the the file that they have come from, thus enabling us to take into account random entries. Barrie > -----Original Message----- > From: John Fairlie [mailto:john.fairlie@blueyonder.co.uk] > Sent: 01 December 2003 17:35 > To: FREEBMD-DISCUSS-L@rootsweb.com > Subject: RE: Latest Update > > > There may not be an easy answer, but I believe there has to > be AN answer, as > confusion rarely clears itself, and the subject will have to > be addressed > sooner or later. > > There seems to be three types of upload to FreeBMD, blocks of > first keyings, > blocks of second keyings, and random/ad hoc entries. > Ignoring the latter, > why can't files uploaded be marked as either first keying or > second keying, > in the same way as they are marked as births, deaths or > marriages? Surely > the syndicate leader must know what he/she has given his/her > members, and as > stated on the list before, transcribers must be told whether > they are doing > first or second keying anyway. > > We would obviously have to go back and classify all the files already > uploaded, but this should be easy for syndicate leaders, and > would be well > worth the effort involved. > > When the database is compiled, it will then be known what all > the first > keying submissions are. Second keying files (and random/ad > hoc) can then be > compared to the first keying and if the records within them > are identical, a > bold entry will result on the search output screen. A second > keying entry > that does not match a first keying entry > could be written to a file and referred to an arbitrator. I > believe that > the FreeBMD system documented on the web site already > proposes arbitrators, > and that an arbitrators upload overrules all other uploads. > > Considering the obvious skill of the FreeBMD programmers, I > would not have > thought this too difficult. > > John Fairlie > Mail us at ..... john@fairlie.plus.com > john.fairlie@blueyonder.co.uk > Home page... http://www.fairlie.plus.com > > > -----Original Message----- > From: Dave Mayall [mailto:david.mayall@ukonline.co.uk] > Sent: Monday, December 01, 2003 11:58 AM > To: FREEBMD-DISCUSS-L@rootsweb.com > Subject: Re: Latest Update > > > On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: > > >The latest update shows a gratifying increase in the number of unique > records. The figures do, however, raise some questions. > > > >Take two event years which have apparently been fully transcribed. > > > >We are told that there are 990,848 unique records of births > in 1898. For > that year, there were 2,474 pages of births to be > transcribed. Assuming an > average of 374 births per page (and an assumption of 375 or > 376 would not > affect the issue), there were 925,276 births in that year. > So the number of > "unique" records exceeds the actual number of births by about > 65,000, ie > about 7 percent. > > > >To take another example, we are told that there are 514, 581 > unique records > of marriages in 1890. In that year, there were 1,206 pages > of marriages to > be transcribed. At 374 entries per page, there were 451,044 marriage > entries in all. (Each marriage of course generates two entries in the > index.) So the number of "unique" records exceeds the actual > number of > marriage entries by over 63,000, ie about 14 percent. > > > >One possible cause of the discrepancy is inconsistencies in > the keying of > individual index entries. I think that we have been > previously told that, > where a page has been double keyed, different transcriptions > of the same > record will show up as two unique records in the update > statistics. If this > is the full explanation, then it raises some disturbing > questions about the > accuracy of our transcription. In any event, the question > does arise of > whether the statistics give a slightly too rosy account of progress. > > The question of the statistics has been raised previously, and there > is no easy answer! > > 1890 Marriages figures from ONS show that there are 223,000 Marriages > (446,000 entries). These are the figures we base our completeness on. > We do know however that there are various factors which tend to make > the actual total to be transcribed higher. > > The overrun seems to be rather excessive, and is worthy of > investigation, and I will do so. > > -- > Dave Mayall > > > ============================== > To join Ancestry.com and access our 1.2 billion online > genealogy records, go > to: > http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237 >

    12/02/2003 03:27:33
    1. Re: Latest Update
    2. Dave Mayall
    3. On Mon, 1 Dec 2003 17:35:12 -0000, you wrote: >There may not be an easy answer, but I believe there has to be AN answer, as >confusion rarely clears itself, and the subject will have to be addressed >sooner or later. > >There seems to be three types of upload to FreeBMD, blocks of first keyings, >blocks of second keyings, and random/ad hoc entries. Ignoring the latter, >why can't files uploaded be marked as either first keying or second keying, >in the same way as they are marked as births, deaths or marriages? Surely >the syndicate leader must know what he/she has given his/her members, and as >stated on the list before, transcribers must be told whether they are doing >first or second keying anyway. > >We would obviously have to go back and classify all the files already >uploaded, but this should be easy for syndicate leaders, and would be well >worth the effort involved. I'm sorry, but I have to disagree. Why do we need to identify which files are first and which are second keying? For the purpose of matching, it suffices to identify that 2 files are different keyings of the same page. It matters not which we regard as first and which we regard as second in the subsequent matching process, so backloading this information onto over a quarter of a million files is a whole load of effort to no purpose. >When the database is compiled, it will then be known what all the first >keying submissions are. Second keying files (and random/ad hoc) can then be >compared to the first keying and if the records within them are identical, a >bold entry will result on the search output screen. A second keying entry >that does not match a first keying entry >could be written to a file and referred to an arbitrator. I believe that >the FreeBMD system documented on the web site already proposes arbitrators, >and that an arbitrators upload overrules all other uploads. Yes, that is part of what will happen. However, the problem you have tried to fix isn't a problem in need of fixing. The main problem in performing the match is aligning the transcriptions so that we know that they need to be compared with each other. This involves taking each accession for a year (a file may contain a number of accessions). Aligning works out that 2 (or more) accessions are sufficiently similar that they represent the same section of index and need to be compared. >Considering the obvious skill of the FreeBMD programmers, I would not have >thought this too difficult. Of course we can do it, given sufficient time. However, at present FreeBMD relies on volunteer programmers doing short stints in their spare time. That means things happen slowly, and that things that are going to stop the service working get priority. If someone can think of a way to pay our programmers the going rate, that might change. -- Dave Mayall

    12/02/2003 12:29:12
    1. RE: Latest Update
    2. John Fairlie
    3. There may not be an easy answer, but I believe there has to be AN answer, as confusion rarely clears itself, and the subject will have to be addressed sooner or later. There seems to be three types of upload to FreeBMD, blocks of first keyings, blocks of second keyings, and random/ad hoc entries. Ignoring the latter, why can't files uploaded be marked as either first keying or second keying, in the same way as they are marked as births, deaths or marriages? Surely the syndicate leader must know what he/she has given his/her members, and as stated on the list before, transcribers must be told whether they are doing first or second keying anyway. We would obviously have to go back and classify all the files already uploaded, but this should be easy for syndicate leaders, and would be well worth the effort involved. When the database is compiled, it will then be known what all the first keying submissions are. Second keying files (and random/ad hoc) can then be compared to the first keying and if the records within them are identical, a bold entry will result on the search output screen. A second keying entry that does not match a first keying entry could be written to a file and referred to an arbitrator. I believe that the FreeBMD system documented on the web site already proposes arbitrators, and that an arbitrators upload overrules all other uploads. Considering the obvious skill of the FreeBMD programmers, I would not have thought this too difficult. John Fairlie Mail us at ..... john@fairlie.plus.com john.fairlie@blueyonder.co.uk Home page... http://www.fairlie.plus.com -----Original Message----- From: Dave Mayall [mailto:david.mayall@ukonline.co.uk] Sent: Monday, December 01, 2003 11:58 AM To: FREEBMD-DISCUSS-L@rootsweb.com Subject: Re: Latest Update On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: >The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed. > >We are told that there are 990,848 unique records of births in 1898. For that year, there were 2,474 pages of births to be transcribed. Assuming an average of 374 births per page (and an assumption of 375 or 376 would not affect the issue), there were 925,276 births in that year. So the number of "unique" records exceeds the actual number of births by about 65,000, ie about 7 percent. > >To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. > >One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. The question of the statistics has been raised previously, and there is no easy answer! 1890 Marriages figures from ONS show that there are 223,000 Marriages (446,000 entries). These are the figures we base our completeness on. We do know however that there are various factors which tend to make the actual total to be transcribed higher. The overrun seems to be rather excessive, and is worthy of investigation, and I will do so. -- Dave Mayall ============================== To join Ancestry.com and access our 1.2 billion online genealogy records, go to: http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237

    12/01/2003 10:35:12
    1. Re: Latest Update
    2. Martin Cope
    3. > From: "John Parker" <johns@parkerj46.fsnet.co.uk> > The latest update shows a gratifying increase in the number of unique > records. The figures do, however, raise some questions. > > [snip] > > One possible cause of the discrepancy is inconsistencies in the keying of > individual index entries. I think that we have been previously told that, > where a page has been double keyed, different transcriptions of the same > record will show up as two unique records in the update statistics. If this > is the full explanation, then it raises some disturbing questions about the > accuracy of our transcription. In any event, the question does arise of > whether the statistics give a slightly too rosy account of progress. > You're right. Mismatched double keyed entries count as two unique entries - the system can't distinguish them from single keyed entries. If you look back in the list archives you'll find previous discussion on this and see that there's little prospect of any improvement in the accuracy of these statistics. Maybe some day some text will be added to the statistics web pages to explain that they are a potentially useful guide but their accuracy is not determinable. Martin Cope

    12/01/2003 06:17:14
    1. Re: Latest Update
    2. Dave Mayall
    3. On Sun, 30 Nov 2003 19:23:53 -0000, you wrote: >The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed. > >We are told that there are 990,848 unique records of births in 1898. For that year, there were 2,474 pages of births to be transcribed. Assuming an average of 374 births per page (and an assumption of 375 or 376 would not affect the issue), there were 925,276 births in that year. So the number of "unique" records exceeds the actual number of births by about 65,000, ie about 7 percent. > >To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. > >One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. The question of the statistics has been raised previously, and there is no easy answer! 1890 Marriages figures from ONS show that there are 223,000 Marriages (446,000 entries). These are the figures we base our completeness on. We do know however that there are various factors which tend to make the actual total to be transcribed higher. The overrun seems to be rather excessive, and is worthy of investigation, and I will do so. -- Dave Mayall

    12/01/2003 04:57:39
    1. Re: Latest Update
    2. Mark Hattam
    3. Are you also factoring in the multiple attempts by the GRO indexer(s) for some records contributing to the "excess". I have several instances in my own research where the indexer has indexed someone as both HATTAM and HALLAM presumably because (s)he couldn't be sure about the crossing of the T's. But there is only one real register entry. For instance look at: vol 5c, page 418, Marriages, Mar 1892 (Posting a link doesn't work) Mark At 7:23 pm +0000 30/11/03, John Parker wrote: >The latest update shows a gratifying increase in the number of >unique records. The figures do, however, raise some questions. > >Take two event years which have apparently been fully transcribed. > >We are told that there are 990,848 unique records of births in 1898. >For that year, there were 2,474 pages of births to be transcribed. >Assuming an average of 374 births per page (and an assumption of 375 >or 376 would not affect the issue), there were 925,276 births in >that year. So the number of "unique" records exceeds the actual >number of births by about 65,000, ie about 7 percent. > >[snip] > >I have previously raised this problem with Peter Dauncey off list, >so he will have been warned. > >J S Parker

    11/30/2003 08:10:09
    1. Latest Update
    2. John Parker
    3. The latest update shows a gratifying increase in the number of unique records. The figures do, however, raise some questions. Take two event years which have apparently been fully transcribed. We are told that there are 990,848 unique records of births in 1898. For that year, there were 2,474 pages of births to be transcribed. Assuming an average of 374 births per page (and an assumption of 375 or 376 would not affect the issue), there were 925,276 births in that year. So the number of "unique" records exceeds the actual number of births by about 65,000, ie about 7 percent. To take another example, we are told that there are 514, 581 unique records of marriages in 1890. In that year, there were 1,206 pages of marriages to be transcribed. At 374 entries per page, there were 451,044 marriage entries in all. (Each marriage of course generates two entries in the index.) So the number of "unique" records exceeds the actual number of marriage entries by over 63,000, ie about 14 percent. One possible cause of the discrepancy is inconsistencies in the keying of individual index entries. I think that we have been previously told that, where a page has been double keyed, different transcriptions of the same record will show up as two unique records in the update statistics. If this is the full explanation, then it raises some disturbing questions about the accuracy of our transcription. In any event, the question does arise of whether the statistics give a slightly too rosy account of progress. I have previously raised this problem with Peter Dauncey off list, so he will have been warned. J S Parker

    11/30/2003 12:23:53
    1. Latest update
    2. Peter Dauncey
    3. The latest update shows an increase of 2,191,063 in the number of unique records. There are 1,241,112 more Births. The big increases are for 1890 (241,505); 1870 (237,844); 1891 (182,142); 1907 (155,137) and 1874 (97,549) but there are 5 other years with increases over 20K: 1889 (61,5360; 1849 (42,947); 1865 (31,881); 1842 (30,695) and 1892 (21,854) There are 123,442 more Marriages. There are just 5 years with increases over 10K: 1890 (21,533); 1897 (20,917); 1908 (17,050); 1891 (11,391) and 1849 (10,571) There are 826,509 more Deaths. The big increases are for 1865 (120,612) and 1858 (103,521) but there are 9 other years with increases over 20K: 1852 (71,742); 1888 (69,804); 1886 (61,283); 1884 (57,569); 1882 (42,117); 1885 (38,317); 1897 (21,992); 1890 (21,506) and 1881 (20,047) Happy searching/transcribing Peter Dauncey

    11/30/2003 10:24:18
    1. Re: submitting 1837 online data to freebmd
    2. Allan Raymond
    3. We are on the same wavelength. Hopefully the task I put on the system earlier today meets the spirit of your/my comments. Allan Raymond -----Original Message----- From: Ben Laurie <ben@algroup.co.uk> To: FREEBMD-DISCUSS-L@rootsweb.com <FREEBMD-DISCUSS-L@rootsweb.com> Date: 29 November 2003 17:42 Subject: Re: submitting 1837 online data to freebmd >Ben Laurie wrote: >> Allan Raymond wrote: >> >>> Sheelagh >>> >>> The definitive answer is given in the Terms and Conditions of 1837online, >>> accessible via their Web Page: http://www.1837online.com/Trace2web/ >>> >>> I'm not a lawyer, but the restrictions in Para 4 of the Terms and >>> Conditions >>> precludes the use of the Material accessed from 1837online, other than >>> for >>> "personal family research". It is entirely up to 1837online to set >>> their own >>> restrictions and very important that FreeBMD is seen to be adhering to >>> these >>> restriction. >> >> FreeBMD is under no obligation to 1837online, since it makes no use of >> their materials. The obligation is on the users of 1837online to comply >> with their T&Cs. > >To be clear, my point is that our policy is not based on any obligation >to 1837online, since none exists, it is one that we have voluntarily >elected to enforce. I did not intend to suggest that our policy is not >as stated. > >Cheers, > >Ben. > > >============================== >To join Ancestry.com and access our 1.2 billion online genealogy records, go to: >http://www.ancestry.com/rd/redir.asp?targetid=571&sourceid=1237 >

    11/29/2003 12:17:22