6 Replies Latest reply on Jan 5, 2017 9:58 AM by Bernd Alheit

    OCR Image in Print-to-PDF Word Document

    johnd31108412

      So, I'm a lawyer. I drafted a brief in MS Word. In a various parts, I included images of the transcript of a hearing --- i.e., images of words. When you convert to PDF, the parts that I typed are OCR'd properly, but the images of the transcript are not. Can somebody tell me how to force Adobe to recognize not just the MS Word-typed words, but to also OCR the images contained in the document.

       

      It's driving me crazy. (Or, I should say, the Ninth Circuit's ridiculous form rules are driving me crazy. But one way or another, I need to fix it.)

        • 1. Re: OCR Image in Print-to-PDF Word Document
          Karl Heinz Kremer Adobe Community Professional

          When you convert from Word to PDF, the document does not have to be OCRed, the text in your document should be accessible right away so that you can search or highlight text. A document that contains such "real" text and images of text will - when you start the OCR process - complain about "renderable text". This means that you cannot OCR a document that contains both real text and text in images - at least not in Adobe Acrobat.

           

          If you can split the document so that the scans are always on a separate page, you may be able to OCR these pages if you delete any other text that might be on them (e.g. page numbers or headers/footers).

           

          For such more challenging OCR tasks, I keep a copy of Abbyy's FineReader around  - this is a dedicated OCR application that can actually OCR such a mixed content document.

          • 2. Re: OCR Image in Print-to-PDF Word Document
            johnd31108412 Level 1

            Karl,

             

            Thanks for the reply. So, basically, Adobe cannot OCR an image that is

            surrounded by renderable text? (When I said "OCR" in my post, I gather that

            the proper term is "renderable" as it applies to MS Word text.)

             

            The point is that the brief should look much like a magazine article: there

            is text, text, text, then an image, followed by text, text, text, in a

            steady, even flow. And according to court rules, even the words in the

            image of a transcript must be OCR'd and searchable.

             

            Well, it appears you've reached the same conclusion I did: PDF misses this

            basic function.

             

            On Wed, Jan 4, 2017 at 3:55 PM, Karl Heinz Kremer <forums_noreply@adobe.com>

            • 3. Re: OCR Image in Print-to-PDF Word Document
              Bernd Alheit Adobe Community Professional & MVP

              johnd31108412 wrote:

              ...

               

              Well, it appears you've reached the same conclusion I did: PDF misses this

              basic function.

              PDF is a file format. You mean Adobe Acrobat?

              • 4. Re: OCR Image in Print-to-PDF Word Document
                johnd31108412 Level 1

                What I mean is this: Is there any way to get an image of a transcript to be

                searchable, OCR'd, rendered, or whatever you want to cal it --- so that a

                computer recognizes there are words --- when that transcript mage is in the

                middle of a document? As I said, I want to drop an image of a transcript

                into the middle of an MS Word legal brief, then I want to convert it to PDF

                and have some stupid program actually OCR the image of the transcript in

                addition to the usual text.

                 

                The court rules require all the words in a legal brief, both the argument

                and words contained in an image, to be searchable and in PDF format. It's

                shocking to me that this is so difficult.

                 

                On Thu, Jan 5, 2017 at 2:25 AM, Bernd Alheit <forums_noreply@adobe.com>

                • 5. Re: OCR Image in Print-to-PDF Word Document
                  Karl Heinz Kremer Adobe Community Professional

                  You may want to switch to a 3rd party dedicated OCR application. As I mentioned before, Abbyy FineReader can do this.

                  • 6. Re: OCR Image in Print-to-PDF Word Document
                    Bernd Alheit Adobe Community Professional & MVP

                    It is not difficult when you use a other tool.