PDFs - image or searchable?

A "searchable" PDF is one where the text on the pages is recognized as separate words by your computer.  Adobe Reader will be able to search for specific words, and you will be able to copy-and-paste sections of text from it to other documents. Most scanning software makes an "image" PDF by default, not a searchable one.

 

To make a searchable PDF, you need to use a machine or software with an "OCR" (optical character recognition) option.

In the Library:

  • The colour Xerox photocopier has an OCR option, but the black&white photocopiers do not.
  • The Zeta book scanner has an OCR option.

The free Adobe Reader program that most people use to read PDFs cannot create an OCR'd searchable document. The commercially available Adobe Acrobat Pro program can convert an already-saved image PDF to a searchable PDF.

 

Google Docs (free) allows you to convert an image PDF to a searchable PDF by uploading it to your Google Docs account with this option selected.

 

No OCR program gets every single word perfectly correct. The newer, clearer your original is, and the more perfectly "straight" the scanned image is (not askew on the scanning glass) the fewer errors will be in the OCR'd version. Black&white scanned images often OCR better than grayscale-scanned images.

 

Some OCR programs do a better job than others with poorer-quality originals, so if you are not happy with one, try another.