I work for an academic library and we are in the process of digitizing past issues of our alumni magazine, which are then uploaded to our Digital Commons repository. We want these to be searchable PDFs, but are running into some issues with the OCR in Acrobat. We’ve been going through all sorts of forums and Google searches in the hopes of fixing these issues, but so far have not had much luck so I’m posting here in the hopes maybe someone will have some tips.
We’re scanning and using the OCR on high resolution images, so it’s most likely not the resolution that is getting in the way. I’ve tried all three settings on the OCR (Searchable Image, Searchable Image (Exact), ClearScan) with the same results in all three.
Anyone have any suggestions?
Running Acrobat X Pro on Windows 7
Process: Scan to TIFF, edit TIFF in Photoshop, compiled TIFFs into one PDF in Acrobat, run OCR/touch up reading order.
Do you have OCR turned off under Edit > Preferences > Convert to PDF> TIFF when you select the Edit Settings ... button? Also, what type of compression settings do you have turned on under this same preference?
We do have the OCR turned off under the preferences, and we’ve just been applying the OCR through Tools > Recognize Text.
Our compression settings under Convert to PDF > TIFF are:
Scan Compression: No
Monochrome Compression: JBIG2 (Lossless)
Grayscale Compression: JPEF (Quality: medium)
Color Compression: JPEG (Quality: medium)
When I’ve gone in and tinkered with the compression settings and turned on OCR, the resulting PDF loses quality and/or weird black spots that were not on the original image appear.
I've been battling with this same exact problem for 2 years. I typically print html to pdf and am lucky if 50% of those have OCR applied. Adobe never has anything more than robotic completely unhelpful suggestions. It's just poorly written buggy software. I'm considering making a move to Nuance's OCR and PDF software. If you value your time you'll do the same. Trust me - I've wasted countless hours on the phone/chatting w/ Adobe support.
At the end of this tutorial titled Scanning and OCR: Beyond the basics with Acrobat 9, there is a suggestion for adding a new text layer to a scanned PDF if you're interested. It's applicable to both Acrobat Pro. 9 and X.
Europe, Middle East and Africa