Copy link to clipboard
Copied
So, I'm a lawyer. I drafted a brief in MS Word. In a various parts, I included images of the transcript of a hearing --- i.e., images of words. When you convert to PDF, the parts that I typed are OCR'd properly, but the images of the transcript are not. Can somebody tell me how to force Adobe to recognize not just the MS Word-typed words, but to also OCR the images contained in the document.
It's driving me crazy. (Or, I should say, the Ninth Circuit's ridiculous form rules are driving me crazy. But one way or another, I need to fix it.)
Copy link to clipboard
Copied
When you convert from Word to PDF, the document does not have to be OCRed, the text in your document should be accessible right away so that you can search or highlight text. A document that contains such "real" text and images of text will - when you start the OCR process - complain about "renderable text". This means that you cannot OCR a document that contains both real text and text in images - at least not in Adobe Acrobat.
If you can split the document so that the scans are always on a separate page, you may be able to OCR these pages if you delete any other text that might be on them (e.g. page numbers or headers/footers).
For such more challenging OCR tasks, I keep a copy of Abbyy's FineReader around - this is a dedicated OCR application that can actually OCR such a mixed content document.
Copy link to clipboard
Copied
Karl,
Thanks for the reply. So, basically, Adobe cannot OCR an image that is
surrounded by renderable text? (When I said "OCR" in my post, I gather that
the proper term is "renderable" as it applies to MS Word text.)
The point is that the brief should look much like a magazine article: there
is text, text, text, then an image, followed by text, text, text, in a
steady, even flow. And according to court rules, even the words in the
image of a transcript must be OCR'd and searchable.
Well, it appears you've reached the same conclusion I did: PDF misses this
basic function.
On Wed, Jan 4, 2017 at 3:55 PM, Karl Heinz Kremer <forums_noreply@adobe.com>
Copy link to clipboard
Copied
johnd31108412 wrote:
...
Well, it appears you've reached the same conclusion I did: PDF misses this
basic function.
PDF is a file format. You mean Adobe Acrobat?
Copy link to clipboard
Copied
What I mean is this: Is there any way to get an image of a transcript to be
searchable, OCR'd, rendered, or whatever you want to cal it --- so that a
computer recognizes there are words --- when that transcript mage is in the
middle of a document? As I said, I want to drop an image of a transcript
into the middle of an MS Word legal brief, then I want to convert it to PDF
and have some stupid program actually OCR the image of the transcript in
addition to the usual text.
The court rules require all the words in a legal brief, both the argument
and words contained in an image, to be searchable and in PDF format. It's
shocking to me that this is so difficult.
On Thu, Jan 5, 2017 at 2:25 AM, Bernd Alheit <forums_noreply@adobe.com>
Copy link to clipboard
Copied
You may want to switch to a 3rd party dedicated OCR application. As I mentioned before, Abbyy FineReader can do this.
Copy link to clipboard
Copied
It is not difficult when you use a other tool.