First of all apologies to those upset at my use of the wrong discussion group!! Was only trying to move things forward!! Dave Mayall's reply to my concerns about TWYS has triggered thoughts in my mind about connected 'problems' previously raised in this discussion. Firstly let me say that I 100% support the objective of TWYS - my concern as a transcriber is more what happens if I don't succeed in this particular point. As a rational (?) human being I would like to understand why and not be told to 'get on with it' and just follow the rules! I also believe there may well be others in the same position (and that was why I originally posted on the Admins group.) As I (now) see it. If I type a name or reference number wrong then it is obvious that this will cause inaccuracy on the database and also wrong data for search purposes. If, for example, I see a forename written as 'Richerd' and then change it because I decide it should really be 'Richard', then this is plainly and understandably wrong in that it does not faithfully reproduce the GRO index. However when it comes to District names I, as a 'fallible human' transcriber, do (or maybe did) not understand why problems might be created if the scan showed 'W.Derby' (i.e. dot but no space) and I typed 'W. Derby' (i.e. space after dot) , or, 'W Derby' (no dot). To me they all represent the same district. I also understood that there was an 'aliasing' process which would consolidate such variations in naming what is in fact the same district. I had assumed that 'aliasing' occurred during the creation of the underlying database thus any differing versions of the same district name would be 'corrected' on that database. If I understand Dave correctly, however, the underlying database will hold exactly the format of the district name that is keyed in - and 'Aliasing' is only carried out by the search process. IF my logic is correct so far ..... Then, whilst my entry is the first (only) keying then there is in fact no (serious) problem since all versions of 'W. Derby' will be treated that same and any search will be correct. HOWEVER, when there is a second keying of the data and the next transcriber enters (say) 'W.Derby' then this difference will cause a second, unmatched entry onto the database. ....... Thus I ask. Is my analysis correct? Will a variation in just the spelling of the district result in duplicate entries? If so, then I think there may be workable solutions to remove such duplicates. ....... Written in an attempt to be helpful within a very good project. Dick Bond
On Sun, 21 Dec 2003 21:52:46 -0000, you wrote: >However when it comes to District names I, as a 'fallible human' transcriber, do (or maybe did) not understand why problems might be created if the scan showed 'W.Derby' (i.e. dot but no space) and I typed 'W. Derby' (i.e. space after dot) , or, 'W Derby' (no dot). To me they all represent the same district. I also understood that there was an 'aliasing' process which would consolidate such variations in naming what is in fact the same district. Correct. >I had assumed that 'aliasing' occurred during the creation of the underlying database thus any differing versions of the same district name would be 'corrected' on that database. It does occur then, and from that point onwards we hold District Name (as transcribed) and District ID (a link to the master list of canonical districts) >If I understand Dave correctly, however, the underlying database will hold exactly the format of the district name that is keyed in - and 'Aliasing' is only carried out by the search process. No. The database holds both items of information. It uses the aliased ID for search purposes, and the original data for display. >IF my logic is correct so far ..... > >Then, whilst my entry is the first (only) keying then there is in fact no (serious) problem since all versions of 'W. Derby' will be treated that same and any search will be correct. > >HOWEVER, when there is a second keying of the data and the next transcriber enters (say) 'W.Derby' then this difference will cause a second, unmatched entry onto the database. > >....... > >Thus I ask. Is my analysis correct? Will a variation in just the spelling of the district result in duplicate entries? No. -- Dave Mayall