RootsWeb.com Mailing Lists
Total: 1/1
    1. Better Search of PDF Files?
    2. Chris Shearer Cooper
    3. I have several PDF files, from various companies, which are scans of historical books which have had some OCR (optical character recognition) done on them. The result is a semi-searchable book, but of course OCR is an imperfect science, so the resulting text is kind of flaky, and the search capabilities provided by the standard Adobe Reader aren't really up to the task of figuring out what the text is "supposed" to be. For example - I'm searching for Daniel Cooper, and where that text appears in the original document, the OCR gives me Danjek Cooqer Daniel Coo er Danjel Coopes Most of these PDF files are "locked" (meaning you can't easily extract the images to run them through another OCR program) and in any case the scanned images aren't of high enough resolution that another OCR program would do any better. To be fair, the original books are often not in great shape, so it's not the fault of the scanner or the company providing the PDFs that the converted text isn't perfect. Is there a program that can do searches on PDF files, that (1) knows a little about the mistakes OCR software commonly makes, or (2) lets you specify the text your searching for with a "fuzziness" factor, so it catches things similar to the searched-for text? Thanks and Happy Holidays, Chris

    12/24/2007 03:09:55