• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

OCR Image in Print-to-PDF Word Document

Guest
Jan 04, 2017 Jan 04, 2017

Copy link to clipboard

Copied

So, I'm a lawyer. I drafted a brief in MS Word. In a various parts, I included images of the transcript of a hearing --- i.e., images of words. When you convert to PDF, the parts that I typed are OCR'd properly, but the images of the transcript are not. Can somebody tell me how to force Adobe to recognize not just the MS Word-typed words, but to also OCR the images contained in the document.

It's driving me crazy. (Or, I should say, the Ninth Circuit's ridiculous form rules are driving me crazy. But one way or another, I need to fix it.)

TOPICS
Edit and convert PDFs

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 04, 2017 Jan 04, 2017

Copy link to clipboard

Copied

When you convert from Word to PDF, the document does not have to be OCRed, the text in your document should be accessible right away so that you can search or highlight text. A document that contains such "real" text and images of text will - when you start the OCR process - complain about "renderable text". This means that you cannot OCR a document that contains both real text and text in images - at least not in Adobe Acrobat.

If you can split the document so that the scans are always on a separate page, you may be able to OCR these pages if you delete any other text that might be on them (e.g. page numbers or headers/footers).

For such more challenging OCR tasks, I keep a copy of Abbyy's FineReader around  - this is a dedicated OCR application that can actually OCR such a mixed content document.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 04, 2017 Jan 04, 2017

Copy link to clipboard

Copied

Karl,

Thanks for the reply. So, basically, Adobe cannot OCR an image that is

surrounded by renderable text? (When I said "OCR" in my post, I gather that

the proper term is "renderable" as it applies to MS Word text.)

The point is that the brief should look much like a magazine article: there

is text, text, text, then an image, followed by text, text, text, in a

steady, even flow. And according to court rules, even the words in the

image of a transcript must be OCR'd and searchable.

Well, it appears you've reached the same conclusion I did: PDF misses this

basic function.

On Wed, Jan 4, 2017 at 3:55 PM, Karl Heinz Kremer <forums_noreply@adobe.com>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 05, 2017 Jan 05, 2017

Copy link to clipboard

Copied

johnd31108412 wrote:

...

Well, it appears you've reached the same conclusion I did: PDF misses this

basic function.

PDF is a file format. You mean Adobe Acrobat?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Jan 05, 2017 Jan 05, 2017

Copy link to clipboard

Copied

What I mean is this: Is there any way to get an image of a transcript to be

searchable, OCR'd, rendered, or whatever you want to cal it --- so that a

computer recognizes there are words --- when that transcript mage is in the

middle of a document? As I said, I want to drop an image of a transcript

into the middle of an MS Word legal brief, then I want to convert it to PDF

and have some stupid program actually OCR the image of the transcript in

addition to the usual text.

The court rules require all the words in a legal brief, both the argument

and words contained in an image, to be searchable and in PDF format. It's

shocking to me that this is so difficult.

On Thu, Jan 5, 2017 at 2:25 AM, Bernd Alheit <forums_noreply@adobe.com>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 05, 2017 Jan 05, 2017

Copy link to clipboard

Copied

You may want to switch to a 3rd party dedicated OCR application. As I mentioned before, Abbyy FineReader can do this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 05, 2017 Jan 05, 2017

Copy link to clipboard

Copied

LATEST

It is not difficult when you use a other tool.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines