OCR Image in Print-to-PDF Word Document

Report · Jan 04, 2017

So, I'm a lawyer. I drafted a brief in MS Word. In a various parts, I included images of the transcript of a hearing --- i.e., images of words. When you convert to PDF, the parts that I typed are OCR'd properly, but the images of the transcript are not. Can somebody tell me how to force Adobe to recognize not just the MS Word-typed words, but to also OCR the images contained in the document.

It's driving me crazy. (Or, I should say, the Ninth Circuit's ridiculous form rules are driving me crazy. But one way or another, I need to fix it.)

Report · Jan 04, 2017

When you convert from Word to PDF, the document does not have to be OCRed, the text in your document should be accessible right away so that you can search or highlight text. A document that contains such "real" text and images of text will - when you start the OCR process - complain about "renderable text". This means that you cannot OCR a document that contains both real text and text in images - at least not in Adobe Acrobat.

If you can split the document so that the scans are always on a separate page, you may be able to OCR these pages if you delete any other text that might be on them (e.g. page numbers or headers/footers).

For such more challenging OCR tasks, I keep a copy of Abbyy's FineReader around - this is a dedicated OCR application that can actually OCR such a mixed content document.

Report · Jan 04, 2017

Karl,

Thanks for the reply. So, basically, Adobe cannot OCR an image that is

surrounded by renderable text? (When I said "OCR" in my post, I gather that

the proper term is "renderable" as it applies to MS Word text.)

The point is that the brief should look much like a magazine article: there

is text, text, text, then an image, followed by text, text, text, in a

steady, even flow. And according to court rules, even the words in the

image of a transcript must be OCR'd and searchable.

Well, it appears you've reached the same conclusion I did: PDF misses this

basic function.

On Wed, Jan 4, 2017 at 3:55 PM, Karl Heinz Kremer <forums_noreply@adobe.com>

Report · Jan 05, 2017

johnd31108412 wrote:
...
Well, it appears you've reached the same conclusion I did: PDF misses this
basic function.

PDF is a file format. You mean Adobe Acrobat?

Report · Jan 05, 2017

What I mean is this: Is there any way to get an image of a transcript to be

searchable, OCR'd, rendered, or whatever you want to cal it --- so that a

computer recognizes there are words --- when that transcript mage is in the

middle of a document? As I said, I want to drop an image of a transcript

into the middle of an MS Word legal brief, then I want to convert it to PDF

and have some stupid program actually OCR the image of the transcript in

addition to the usual text.

The court rules require all the words in a legal brief, both the argument

and words contained in an image, to be searchable and in PDF format. It's

shocking to me that this is so difficult.

On Thu, Jan 5, 2017 at 2:25 AM, Bernd Alheit <forums_noreply@adobe.com>

Report · Jan 05, 2017

You may want to switch to a 3rd party dedicated OCR application. As I mentioned before, Abbyy FineReader can do this.

Report · Jan 05, 2017

It is not difficult when you use a other tool.

Adobe Community

OCR Image in Print-to-PDF Word Document