----- Original Message ----- From: "Dave Mayall" <[email protected]> To: <[email protected]> Sent: Sunday, July 22, 2001 10:15 AM Subject: Re: The Complex Search Issue > On Sun, 22 Jul 2001 10:27:13 +0200, you wrote: > > >Hi Susannah > > > >If the name you are researching is very rare, then even searching the entire > >database does not return a complete page, let alone multiple pages. > > And would not give a "Search too Complex" error. > > >So to those who want to change the search engine, please consider the whole > >range of people who will want to use FreeBMD, including the one-name study > >style or researcher, and those researchers whose family names are extinct in > >20th century England and Wales. > > "Those who want to change the search engine" reads as if the changes > that have been implemented have been implemented because of some > strange desire to make life difficult. > > Let me state again that the changes were NECESSARY. Cross database > searching makes HUGE demands on the server, and if hugely complex > searches are allowed EVERYBODY else who is searching at the same time > gets thrown out. > > It is not acceptable that a single search should prevent 40 others > from working, so we have blocked the searches that were causing the > problem. It is *still* possible to get the same information by > searching a limited timescale (say in 10 year chunks). I realise that > this is less convenient, but it can still be done. Dave You have explained why it was necessary to do this. But it might be helpful to change the terminology employed since the issue is not strictly one of complexity but of the number of potential hits. A complex search implies, at least to me, that it is one that seeks to match multiple criteria. Such a search might be anticipated to require more cpu time. Whereas searching for a single but common surname is not complex. But the number of hits also takes up too much cpu time. Yet the current default explanation implies that it is searches lacking a surname that will cause problems. Peter Norman
Hi, I have the same problem with trying to find out what my syndicate has uploaded onto the server but I think I may have a compromise on server usage. Instead of having what Brian suggested (check range but for all files that have been uploaded) My idea would be more like the show file page (http://freebmd.rootsweb.com/cgi/show-file.pl) except you don't need to give a file name! Currently you have to be a mind reader and know how your syndicate member names the file:) (which isn't always totally logical) When you type in the user, it would list out the files they have uploaded. Ultimately it would be nice if you could view them so you can check when they have put more than one scan together. If that could work, an even better show file would be if you type in your syndicate name and all individuals that are assigned as your syndicate have their files listed out ( individual name - file name). How do you think that could work? Bye for now Mary Muir [email protected] Beautiful BC Syndicate ----- Original Message ----- From: "Dave Mayall" <[email protected]> > The fact that information is in the database doesn't mean that the > query to extract it will be quick and easy. Carrying out an intensive > search on the files could completely lock out ordinary searches. > > I will look at doing it, BUT... > > 1) It will need changes to the indexing, and that can't happen without > a database update. > 2) It almost certainly won't be done before the next database update. > 3) It will have to take a low priority because our current efforts are > focussed on some critical re-tuning of the database. > 4) Identifying files that have been modified since being uploaded will > not be possible without significant effort. > Dave Mayall
In message <[email protected]>, Bob Phillips <[email protected]> writes >Hi > I don't have a problem with deciding what to put as regards the >transcribing rules and I wasn't intending to query one scan. As usual I >don't make myself clear. > I don't know how many entries there will be for this quarter but I would >estimate that 50% (very rough) of the entries will have uncertain or unknown >page numbers. This means we are building up an enormous quantity of entries >that will require checking at the source. Re-entering via the original scan >will not cure it. > The number of entries that require checking must be already large but it >seems silly to purposely add to it. By checking I mean someone going to the >original fiche which is what I believe is intended, not the double entering >which I believe is also intended. > The scrapping of the scans for this entire quarter would be a management >decision depending on the cost of rescanning, the likelihood that re-scans >would be any better which they would be if in 16 grey scale and the >availability of other quarters for transcribing. > This is not a moan, I'll transcribe anything. I am just worrying that >we are making an almost insoluble problem. We are up to scan 600+, surname >Collins, at 40 entries a scan; that equals 24,000. We are, almost >deliberately, putting 12,000 20,000 or 30,000 extra entries in that >require later checking. I think Bob makes a number of valid points. What is the position wrt "rejected" scans? I've had to highlight 5 scans as being completely blank. Does my syndicate leader feed that back up the chain and what is the [theoretical] next step? What is the position wrt patchy scans - scans that are otherwise ok but have a very poor [illegible] section? Should these be mentioned to syndicate leaders? PP
Hi Susannah If the name you are researching is very rare, then even searching the entire database does not return a complete page, let alone multiple pages. And if your research interest is as a 'one-name studier' then even if you did get several pages of information you would be interested in every person on those several pages. So in my opinion the 'sensible way of working' depends upon how common the name you are searching is, and also your research interest. For example, one of the variants of my surname had died out in Cornwall by 1891, although it is the most common contemporary variant in Australia. When I searched the entire >14 million entries last week the search returned only 1 entry, excluding the death entries that I have uploaded myself from my one-name study information. Excluding the information I have uploaded as one-name study file, for all the variants of my name there are about 35 entries in total. I did the search as 1 search per surname variant spelling and received the results for each variant in a few seconds. So to those who want to change the search engine, please consider the whole range of people who will want to use FreeBMD, including the one-name study style or researcher, and those researchers whose family names are extinct in 20th century England and Wales. Mary Trevan Guild of One-Name Studies: http://www.one-name.org/ Membership no: 3530 - Treveighen / Trevan / Trevean / Trevains / Trevehen -----Original Message----- From: Ian Miles <[email protected]> To: [email protected] <[email protected]> Date: Saturday, 21 July, 2001 7:45 PM Subject: Re: The Complex Search Issue >Since I spend most of my time transcribing & uploading names I only >occasionally get around to searching for my elusive ancestors. > >Last night I spent a few hours investigating my SHAW family from Boston, >Lincolnshire. Since I know names in many cases but not dates I find it easy >& much more logical to do "small" searches. It makes more sense to me to >search each event, B,M or D separately & in time frames of 10 - 15 years. > >Why would anyone want to set up a search that is going to provide multiple >pages of names that then need printing & careful checking? Aren't you just >making a rod for your own back? > >Regards >Susannah Miles > > >============================== >Visit Ancestry.com for a FREE 14-Day Trial and enjoy access to the #1 >Source for Family History Online. Go to: >http://www.ancestry.com/subscribe/subscribetrial1y.asp?sourcecode=F11HB >
On Sun, 22 Jul 2001 10:27:13 +0200, you wrote: >Hi Susannah > >If the name you are researching is very rare, then even searching the entire >database does not return a complete page, let alone multiple pages. And would not give a "Search too Complex" error. >So to those who want to change the search engine, please consider the whole >range of people who will want to use FreeBMD, including the one-name study >style or researcher, and those researchers whose family names are extinct in >20th century England and Wales. "Those who want to change the search engine" reads as if the changes that have been implemented have been implemented because of some strange desire to make life difficult. Let me state again that the changes were NECESSARY. Cross database searching makes HUGE demands on the server, and if hugely complex searches are allowed EVERYBODY else who is searching at the same time gets thrown out. It is not acceptable that a single search should prevent 40 others from working, so we have blocked the searches that were causing the problem. It is *still* possible to get the same information by searching a limited timescale (say in 10 year chunks). I realise that this is less convenient, but it can still be done. -- Dave Mayall
On Sat, 21 Jul 2001 14:21:07 +0100, you wrote: >If the information is already held in a database as Graham Hart has said >would it really be a problem. Yes. The fact that information is in the database doesn't mean that the query to extract it will be quick and easy. Carrying out an intensive search on the files could completely lock out ordinary searches. >I find the delays between updates a real >problem, and I assume it will only get worse as the size of the database >continues to grow. I will look at doing it, BUT... 1) It will need changes to the indexing, and that can't happen without a database update. 2) It almost certainly won't be done before the next database update. 3) It will have to take a low priority because our current efforts are focussed on some critical re-tuning of the database. 4) Identifying files that have been modified since being uploaded will not be possible without significant effort. -- Dave Mayall
Good day all. Thankyou to all who offered words of encouragement to still my fears about upgrading SpeedBMD. I upgraded this evening and it appears all is working fine - but oh dear. I know you can't please everybody and I know the transcribers asked for the new modifications - but is it going to be too much of a disaster if I go back to the previous version? I really cannot cope with those numbers on the "End of Batch" bar in the surname column when in Block Input mode. Every time I look up from the keyboard to check my page number input, my eyes which are looking for a set of numbers are automatically drawn to the gaudy yellow set instead of the plain black ones at the other side of the page. Is it just me? I can only hope that I will get used to it eventually. Best wishes Anne Cruise
2p worth ... If they do need checking, then at least the scale of the problem is better known. So transcribe and upload. If I'm searching for a person, I'd prefer a full 100% record obviously. But even a "check needed" result is better than nothing. At least I know which books on the shelf to go and look for. So transcribe and upload. Mark - - >Hi > I don't have a problem with deciding what to put as regards the >transcribing rules and I wasn't intending to query one scan. As usual I >don't make myself clear. > I don't know how many entries there will be for this quarter but I would >estimate that 50% (very rough) of the entries will have uncertain or unknown >page numbers. This means we are building up an enormous quantity of entries >that will require checking at the source. Re-entering via the original scan >will not cure it. > The number of entries that require checking must be already large but it >seems silly to purposely add to it. By checking I mean someone going to the >original fiche which is what I believe is intended, not the double entering >which I believe is also intended. > The scrapping of the scans for this entire quarter would be a management >decision depending on the cost of rescanning, the likelihood that re-scans >would be any better which they would be if in 16 grey scale and the >availability of other quarters for transcribing. > This is not a moan, I'll transcribe anything. I am just worrying that >we are making an almost insoluble problem. We are up to scan 600+, surname >Collins, at 40 entries a scan; that equals 24,000. We are, almost >deliberately, putting 12,000 20,000 or 30,000 extra entries in that >require later checking. >Bob Phillips > > >----- Original Message ----- >From: "Graham Hart" <[email protected]> >To: <[email protected]> >Sent: Saturday, July 21, 2001 10:30 AM >Subject: Re: Hand written scans > > >> Hi, >> >> Bob Phillips wrote: >> > >> > Hi >> > The syndicate I am in are doing 1846B1. They are handwritten and >the scans are very variable in quality. >> > As an example I have Brown, Mary, a whole sheet of them, very >easy, only have to check first and last and that's fine. Although I am >certainly not entering what I see. For the Districts and Volumes they can >be "worked out" with a high degree of accuracy, probably 100%. The page >numbers are 90% unreadable. >> >> As long as what you are interpreting is what you think is written rather >> than what you tink should have been written :) >> >> You shouldn't be correcting districts and volumes that happen not to >> match what you think they should be according to SpeedBMD and the like. >> But is, after looking at the district or the volume you can understand >> the writing clearer then that is ok. >> >> > Should this scan be rejected or transcribed. >> >> That's a difficult one. If the scan is transcribed then it clearly needs >> to be transcribed with the correct unreadable character terminology in >> the page numbers. >> >> As to whether the scan should be rejected .. I'll be interested in >> Dave's view cos he has been managing a scan syndicate for a while and >> probably has come across this more than I have. >> >> Cheers >> >> Graham >> >> > > > Bob Phillips
Hi I don't have a problem with deciding what to put as regards the transcribing rules and I wasn't intending to query one scan. As usual I don't make myself clear. I don't know how many entries there will be for this quarter but I would estimate that 50% (very rough) of the entries will have uncertain or unknown page numbers. This means we are building up an enormous quantity of entries that will require checking at the source. Re-entering via the original scan will not cure it. The number of entries that require checking must be already large but it seems silly to purposely add to it. By checking I mean someone going to the original fiche which is what I believe is intended, not the double entering which I believe is also intended. The scrapping of the scans for this entire quarter would be a management decision depending on the cost of rescanning, the likelihood that re-scans would be any better which they would be if in 16 grey scale and the availability of other quarters for transcribing. This is not a moan, I'll transcribe anything. I am just worrying that we are making an almost insoluble problem. We are up to scan 600+, surname Collins, at 40 entries a scan; that equals 24,000. We are, almost deliberately, putting 12,000 20,000 or 30,000 extra entries in that require later checking. Bob Phillips ----- Original Message ----- From: "Graham Hart" <[email protected]> To: <FREEBMD-DISCU[email protected]> Sent: Saturday, July 21, 2001 10:30 AM Subject: Re: Hand written scans > Hi, > > Bob Phillips wrote: > > > > Hi > > The syndicate I am in are doing 1846B1. They are handwritten and the scans are very variable in quality. > > As an example I have Brown, Mary, a whole sheet of them, very easy, only have to check first and last and that's fine. Although I am certainly not entering what I see. For the Districts and Volumes they can be "worked out" with a high degree of accuracy, probably 100%. The page numbers are 90% unreadable. > > As long as what you are interpreting is what you think is written rather > than what you tink should have been written :) > > You shouldn't be correcting districts and volumes that happen not to > match what you think they should be according to SpeedBMD and the like. > But is, after looking at the district or the volume you can understand > the writing clearer then that is ok. > > > Should this scan be rejected or transcribed. > > That's a difficult one. If the scan is transcribed then it clearly needs > to be transcribed with the correct unreadable character terminology in > the page numbers. > > As to whether the scan should be rejected .. I'll be interested in > Dave's view cos he has been managing a scan syndicate for a while and > probably has come across this more than I have. > > Cheers > > Graham > > > > Bob Phillips > > > > ============================== > > Ancestry.com Genealogical Databases > > http://www.ancestry.com/rd/rwlist2.asp > > Search over 2500 databases with one easy query! > > > ============================== > Ancestry.com Genealogical Databases > http://www.ancestry.com/rd/rwlist2.asp > Search over 2500 databases with one easy query! >
Since I spend most of my time transcribing & uploading names I only occasionally get around to searching for my elusive ancestors. Last night I spent a few hours investigating my SHAW family from Boston, Lincolnshire. Since I know names in many cases but not dates I find it easy & much more logical to do "small" searches. It makes more sense to me to search each event, B,M or D separately & in time frames of 10 - 15 years. Why would anyone want to set up a search that is going to provide multiple pages of names that then need printing & careful checking? Aren't you just making a rod for your own back? Regards Susannah Miles
Hi, Brian Smart wrote: > > If the information is already held in a database as Graham Hart has said > would it really be a problem. I find the delays between updates a real > problem, and I assume it will only get worse as the size of the database > continues to grow. The problem isn't whether we have the information, its the size of the query to retrieve it which I also mentioned in my response. If it takes too long then it will hold up all other queries and kill searches again .. we do have that problem already with some of the syndicate stuff and it would be silly to add to it ... We'd need to think carefully about how to implement it ... Cheers Graham > > Regards > > Brian Smart > > -----Original Message----- > From: Dave Mayall [mailto:[email protected]] > Sent: 21 July 2001 11:45 > To: [email protected] > Subject: Re: Upload Checking for Syndicate Leaders > > On Sat, 21 Jul 2001 10:21:24 +0100, you wrote: > > >Would it be possible to provide a list showing any files that have been > >uploaded, but not yet incorporated in the main database. > > > >This would work the same way as the "Analyse the data submitted by your > >syndicate" but would only include files as above. > > > >The reason for this request is that the time delay between updates is now > so > >long that I find it very difficult to keep track of progress. > > Possible, but probably *very* intensive on the database. > > -- > Dave Mayall > > ============================== > Visit Ancestry's Library - The best collection of family history > learning and how-to articles on the Internet. > http://www.ancestry.com/learn/library > > ============================== > Visit Ancestry.com for a FREE 14-Day Trial and enjoy access to the #1 > Source for Family History Online. Go to: > http://www.ancestry.com/subscribe/subscribetrial1y.asp?sourcecode=F11HB
If the information is already held in a database as Graham Hart has said would it really be a problem. I find the delays between updates a real problem, and I assume it will only get worse as the size of the database continues to grow. Regards Brian Smart -----Original Message----- From: Dave Mayall [mailto:[email protected]] Sent: 21 July 2001 11:45 To: [email protected] Subject: Re: Upload Checking for Syndicate Leaders On Sat, 21 Jul 2001 10:21:24 +0100, you wrote: >Would it be possible to provide a list showing any files that have been >uploaded, but not yet incorporated in the main database. > >This would work the same way as the "Analyse the data submitted by your >syndicate" but would only include files as above. > >The reason for this request is that the time delay between updates is now so >long that I find it very difficult to keep track of progress. Possible, but probably *very* intensive on the database. -- Dave Mayall ============================== Visit Ancestry's Library - The best collection of family history learning and how-to articles on the Internet. http://www.ancestry.com/learn/library
On Sat, 21 Jul 2001 10:21:24 +0100, you wrote: >Would it be possible to provide a list showing any files that have been >uploaded, but not yet incorporated in the main database. > >This would work the same way as the "Analyse the data submitted by your >syndicate" but would only include files as above. > >The reason for this request is that the time delay between updates is now so >long that I find it very difficult to keep track of progress. Possible, but probably *very* intensive on the database. -- Dave Mayall
Hi Brian, Brian Smart wrote: > > Would it be possible to provide a list showing any files that have been > uploaded, but not yet incorporated in the main database. > > This would work the same way as the "Analyse the data submitted by your > syndicate" but would only include files as above. > > The reason for this request is that the time delay between updates is now so > long that I find it very difficult to keep track of progress. I think what would be possible would be to identify new files, but not necessarily corrected files. We do hold the file names in the database so we know which have been incorporated. I think it sounds a possibility .. we would have to be careful that it doesn't create very large queries as some of the syndicate maintainance code does. Cheers Graham > > Regards > > Brian Smart > Brian's Scan Syndicate > > ============================== > Visit Ancestry's Library - The best collection of family history > learning and how-to articles on the Internet. > http://www.ancestry.com/learn/library
Hi, Bob Phillips wrote: > > Hi > The syndicate I am in are doing 1846B1. They are handwritten and the scans are very variable in quality. > As an example I have Brown, Mary, a whole sheet of them, very easy, only have to check first and last and that's fine. Although I am certainly not entering what I see. For the Districts and Volumes they can be "worked out" with a high degree of accuracy, probably 100%. The page numbers are 90% unreadable. As long as what you are interpreting is what you think is written rather than what you tink should have been written :) You shouldn't be correcting districts and volumes that happen not to match what you think they should be according to SpeedBMD and the like. But is, after looking at the district or the volume you can understand the writing clearer then that is ok. > Should this scan be rejected or transcribed. That's a difficult one. If the scan is transcribed then it clearly needs to be transcribed with the correct unreadable character terminology in the page numbers. As to whether the scan should be rejected .. I'll be interested in Dave's view cos he has been managing a scan syndicate for a while and probably has come across this more than I have. Cheers Graham > Bob Phillips > > ============================== > Ancestry.com Genealogical Databases > http://www.ancestry.com/rd/rwlist2.asp > Search over 2500 databases with one easy query!
Would it be possible to provide a list showing any files that have been uploaded, but not yet incorporated in the main database. This would work the same way as the "Analyse the data submitted by your syndicate" but would only include files as above. The reason for this request is that the time delay between updates is now so long that I find it very difficult to keep track of progress. Regards Brian Smart Brian's Scan Syndicate
Hi The syndicate I am in are doing 1846B1. They are handwritten and the scans are very variable in quality. As an example I have Brown, Mary, a whole sheet of them, very easy, only have to check first and last and that's fine. Although I am certainly not entering what I see. For the Districts and Volumes they can be "worked out" with a high degree of accuracy, probably 100%. The page numbers are 90% unreadable. Should this scan be rejected or transcribed. Bob Phillips
Graham Hart wrote: > > Forwarded from listowner. > > -------- Original Message -------- > Subject: Search Too Complex > Date: Wed, 18 Jul 2001 12:49:56 -0600 > From: "Max & Carol Scott" <[email protected]> > Reply-To: "Max & Carol Scott" <[email protected]> > To: "FREEBMD- DISCUSS" <[email protected]> > > When I looked at some of the data I had uploaded to Freebmd, I found > that > unless I entered exactly as, for example, this: > > [R_]adford > > it was unable to carry out my search, saying too complex. I had > imagined > that I would be able to search as: *adford > but it appears not to be the case. > > Can I ask, therefore, it there is any merit in entering names with [_] > as > the initial letter. I had taken advice on this when I started as a > newbie, > but now I am wondering if I should be transcribing in some other way. > > Any help would be appreciated. If that is what can be seen, then yes you transcribed it correctly. As the search *currently* operates, entries that include uncertain characters are difficult to search on, particularly where that uncertainty occurs at the start. What we must NEVER do is compromise the integrity of the data by ignoring the correct way of transcribing data in order to evade the *present* limitations of the search (we can improve the search, we can't fix data nearly as easily). -- Dave Mayall
On Wed, 18 Jul 2001 16:34:04 +0100, you wrote: >This too complex search problem does seem to throw up some inconsistent >objections. And prevents searches that previously used to work. Whilst the outcome may seem inconsistent, the score is an accurate representation of the "effort" required to carry out the search (as opposed to some arbitrary rules as to what had to be specified as is the case on familysearch.org) >Example surname = White and district= Eastry gives >"The maximum limit for search complexity is 2500000 your search has a >complexity of 3278070 " >Only two criteria. > >The poorer alternative of specifying the county (Kent) gives a complexity of >3541140. Indeed, specifying District/County will help little (if at all) with the score, because the two aren't part of the same index (and there are very good reasons why they aren't part of the same index) >This suggests that there are problems with searching the commonest Surnames >in this way. Clearly an undesirable problem to arise. We can only cater for so many search paths (due to file size limits), and we have to cater for the Name searches first, ahead of speculative "fishing trips" >I just checked two other themes >Pearce + Wiltshire is OK, although Pearce is a very common Wiltshire surname >But Moore + Wilts just fails (complexity 2630430) The complexity isn't governed by the local rarity of the surname, but by the total occurrences nationwide. We *are* constantly refining the indexes (I spent 4 hours today developing a new index structure) to allow more searches to complete, but there *is* a limit to what we can achieve. -- Dave Mayall
Hi Carol You can get a limited result by typing in *alford (surname) June 1838 Marriages District 10 (I just happen to know this is one of the records that you uploaded). This only proves that your entry [R_] was not done in vain but hasn't totally answered your question. As an aside I was wondering why you used [R_] instead of _ ? To quote from our "Hints and Help For Beginners" guide. Put an underscore ( _ ) for each unreadable character if you are sure you can see how many there are (e.g put ___ if there are 3 unreadable characters.) Put an asterisk ( * ) for more than one unreadable character if you can't see how many there are. If you can't decide between 2 or more possible characters in a single position, put them in square brackets e.g. [79] would mean it's a 7 or a 9 (the seven is more likely so goes first). Regards Allan Raymond [email protected] http://www.btinternet.com/~allan_raymond/Monarchies_of_Europe.htm FreeBMD - putting birth marriages and deaths on the Internet http://FreeBMD.rootsweb.com/ ----- Original Message ----- From: "Graham Hart" <[email protected]> To: <[email protected]> Sent: 18 July 2001 20:08 Subject: [Fwd: Search Too Complex] Forwarded from listowner. -------- Original Message -------- Subject: Search Too Complex Date: Wed, 18 Jul 2001 12:49:56 -0600 From: "Max & Carol Scott" <[email protected]> Reply-To: "Max & Carol Scott" <[email protected]> To: "FREEBMD- DISCUSS" <[email protected]> When I looked at some of the data I had uploaded to Freebmd, I found that unless I entered exactly as, for example, this: [R_]adford it was unable to carry out my search, saying too complex. I had imagined that I would be able to search as: *adford but it appears not to be the case. Can I ask, therefore, it there is any merit in entering names with [_] as the initial letter. I had taken advice on this when I started as a newbie, but now I am wondering if I should be transcribing in some other way. Any help would be appreciated. Carol Scott