I also have used the Textbridge engine at home for a while as a part of the Pagis pro software and have been reasonbly satisfied with cost/benefit ( for less that $100 after rebate). TEXT pages of old geno book sources that I shot a digital picture of (jpeg) I got into text file format (editable and searchable) ok. Be aware of copyrights, etc. Discussion below is right on. The conditon (clarity, column arrangement, etc) of the Source/original is very important. File format conversions and searching is difficult even for the experts. I just had a $500 expanded scanning office suite version (forget now who owns/licenses the underlying technology..perhaps Paperport/Scansoft/Xerox merging) purchased at my office recently for my needs of converting pdf documents, etc. to MS Word, and other file conversions. Works pretty well to scan numbers for analysis into Excel spreadsheet. Happy to provide particulars off line if asked as the quality of each component piece in the process plays a part. ----- Original Message ----- From: "Daniel H. Weiskotten" <[email protected]> To: <[email protected]> Sent: Friday, March 21, 2003 8:18 PM Subject: Re: [NYFL] Software > At 3/21/03 06:01 PM, Joan wrote: > > Does anyone know of any software out there that REALLY converts digital > >text images into searchable document format???? In particular, photocopies > >of old obits??? > > I've had software that CLAIMS it will convert digital to text, but I > > end up > >with two or three words per page that actually converts, and the rest is > >funky little symbols!! I'd LOVE to no longer have to transcribe ALL of my > >obits!! Joan: > > I do text scaning all the time, but the main problem is not with the > software, but with what is being scanned. > > It is called OCR, or Optical Character Recognition, and there a literally > scores of programs and scanners that can do this for you. Almost any > scanner nowadays comes with decent OCR software, although they tend to make > you go an extra step to set it up. Once done and running, it all depends > upon the quality of what you are scanning. > > The software does not work like your eyes and brain. It looks for > patterns, and when all it sees is smudges and grey, then you get > gibberish. The copy needs to be exceptionally clean (any dots will get > interpreted as characters, letters, numbers or punctuation) and if it is > fancy lettering or even regular text with tails it will not know what to do > with it. The letter M will scan as "ni" and S will be an 8. That s > typical even in a good scan. > > I have not worked with scanning of digital images of text, but know that > the same problems will result if things are not sharp and clear. I have > not found a way to scan newspaper clippings as the color and texture of the > paper will knock out any chance you have of scanning just the letters. I > have had success in photocopying and enlarging the clipping (lighter and > 200%) which allows for clearer reading of the letters, but often there is > little that can be done to clean it up enough to OCR scan it. The time > that you spend cleaning it up would be as well spent just retyping it. > > For many years I have been using TextBridge by ScanSoft, but it came with > my scanner 6 or 7 years ago and I can't find a decent modern version that > is not expensive and buried in with a package of other programs that I > don't need. I have another program at work, but it is also buried in the > scanning software and I'm not sure what it is. Our HP printer has OCR > scanning capabilites but you had to load it all seperately and then it is > not a flat bed scanner and when it feeds the document it skews (or shreds) > the original and can't read it. > > One of our volunteers just paid a small fortune for a small hand-held > scanner that was touted as being great for scanning newsprint, but all it > does is make a fuzzy .JPG image that is useless. It wasn't even good as a > glorified copier. > > If you can, make a photocopy and clean up and enlarge the original, set the > dots per inch (dpi) high (300 or more) and make sure everything is > perfectly square on the scanning bed. Then there is still no guarantee > that it will do what you want. > > One of my recent projects is to scan a 1901 local history. I tried again > and again to scan a reprint of it as I did not have an original and I > didn't want to do it to the local library's original copy. Although the > reprint was quite clean, the lettering had darkedend and blured in the copy > process, so I got nothing but garbage each time. I did luck out recently > when the very book I needed came up on e-bay and I wa able to get it for a > mere $7.50 (dealer value of over $100.00!) and I have carefully been > scanning with incredible success for the past few weeks. > > I am also having some of my volunteers retype a lot of the old newspaper > and newsletter articles in our verticle files, so that we can reprint them > and also do searches and use them iun research. OCR scanning just didn't > work so they are having to type them manually. I am lucky to have > recruited a number of people who love to type and learn history! Bless > their hearts and fingers! > > Dan W. > http://www.rootsweb.com/~nyccazen/ > > Also, the Assistant Director of the Chesterfield Historical Society of Virginia > (where I get to do history for a living and have lots of great > Volunteers to work with!) > (If I could just sucker them into doing my Cazenovia > research for me ...) > > > > > > At 3/21/03 06:01 PM, you wrote: > > Does anyone know of any software out there that REALLY converts digital > >text images into searchable document format???? In particular, photocopies > >of old obits??? > > I've had software that CLAIMS it will convert digital to text, but I > > end up > >with two or three words per page that actually converts, and the rest is > >funky little symbols!! I'd LOVE to no longer have to transcribe ALL of my > >obits!! >