• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

I wish to do OCR on a 345pp scanned document. Edit PDF not doing anything.

Engaged ,
Jan 15, 2017 Jan 15, 2017

Copy link to clipboard

Copied

It's a while since I've done any OCR in Acrobat and the (still lamentable) UI has changed a bit since then. I did watch a Adobe tutorial on Acrobat and it said to use the "Edit PDF" tool, and though it's in a different place on my Acrobat Pro I found it easily enough.

The problem is when I click on that tool all that happens is a blue box outlining the perimeter of each scanned page appears, unlike in the tutorial where the text areas are surrounded with boxes and you can interact with the text as strings.

This is the document I wish to attempt OCR on, it's not a great scan, with reverse pages leaking through the image but the text is important so I want to attempt it.

The Great Chronicle Of Buddhas - download it here  (each chapter is about 100MB DL)

Appreciate any help getting me up to speed with OCR in Acrobat these day.

TOPICS
Edit and convert PDFs

Views

953

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Engaged , Jan 15, 2017 Jan 15, 2017

I did draft a complete answer to this but seems like a application freeze wiped it out before I hit Add Reply.

Even the Adobe telephone support said you can't edit a scanned document. Pointed out it was possibly over ten years ago. Checked his notes and it is.

In Tools you need to select "Enhance Scan" tool. Then select any item in the "Recognise Text" dropdown menu in the 2nd level toolbar ("This File" for eg) then click on the "Recognise Text" button on the third level toolbar which appears.

Did

...

Votes

Translate

Translate
Engaged ,
Jan 15, 2017 Jan 15, 2017

Copy link to clipboard

Copied

I did draft a complete answer to this but seems like a application freeze wiped it out before I hit Add Reply.

Even the Adobe telephone support said you can't edit a scanned document. Pointed out it was possibly over ten years ago. Checked his notes and it is.

In Tools you need to select "Enhance Scan" tool. Then select any item in the "Recognise Text" dropdown menu in the 2nd level toolbar ("This File" for eg) then click on the "Recognise Text" button on the third level toolbar which appears.

Did a pretty good job on that document but some last glyphs on words were occasionally left off. 99% of words scan I imagine. Being a 345pp document and but one of eight such I imagine it would be very burdensome to complete the document by hand. Also the facility to upload a Pali language dictionary or English + Pali to handle all the words with (unusual to English speakers) diacritics like “Āloka” might help it. Not sure if that's possible even by hacking the dictionary files for Acrobat. Will ask separately.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 16, 2017 Jan 16, 2017

Copy link to clipboard

Copied

Steps you followed are correct to run OCR.

Can you please share 2 things for better understanding the issue.

- Acrobat version you are using

- Can you please take 1 single page out of any PDF you are using. and share the exact issue you are facing. It will help us to concentrate on the exact issue you have.

you can use https://cloud.acrobat.com/send  to share the file.

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 17, 2017 Jan 17, 2017

Copy link to clipboard

Copied

This question has been answered (by myself, sorry if it took a while for mods to post the answer).

In the interests of the requests for a small file uploaded to Adobe cloud here's pp1-4.


You'll see that ~99% of the glyphs on these pages are correctly recognised as text strings, yet a few (usually at the end of longer words) get omitted. It happens on both English and Pali language words, so I don't think it's a dictionary issue, though maybe it is and I'm not thinking it through carefully enough.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 16, 2017 Jan 16, 2017

Copy link to clipboard

Copied

It does not look like I can download anything without signing up for some service. To be blunt: If you want help with your problem, don't make it hard to actually help you. Most of the people here are doing this in their spare time. If you need help in how you can share a page or two so that we can just download the file, I wrote up some information about how to use Adobe's Document Cloud to share files: Share Documents via Adobe's Document Cloud - KHKonsulting LLC

You may want to take a look at these recent questions, which may answer your question as well:

Can't edit scanned pdf - doesn't work like in the tutorials

Goading Acrobat Pro DC (Mac) into OCR in Edit mode

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 17, 2017 Jan 17, 2017

Copy link to clipboard

Copied

Hi Karl, not sure what you are seeing but I and other people I know have download from that page without any of the issues you describe. Just worked™. Didn't occur to me to share upload the file for the reason that it seemed to be readily available to anybody, even people on slow connections in Burma :-).

Thanks for the links. I did get it working, it wasn't perfect, but neither are the source scans.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jan 17, 2017 Jan 17, 2017

Copy link to clipboard

Copied

I get a pop-up block in Chrome on your website link too, Karl.  I just copied the link and pasted it into the browser.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 17, 2017 Jan 17, 2017

Copy link to clipboard

Copied

Hi, I have gone through the PDF you shared. As you said, most of the words recognized are good. But there is issue in few of them.

You are getting this problem, because no Text recognition tool can recognize 100% text for all the documents. And the problem here is this document is not of good resolution + colored background + have shadows of text written on reverse page.

For this kind of things, we have a functionality(Suspect Correction) where you can correct the text if something is not recognized.

In Tools you need to select "Enhance Scan" tool. Then "Recognise Text" dropdown menu  and click "In This File" option. Now click on settings and select "Searchable Image Exact" and then click on the "Recognise Text" button on the third level toolbar which appears.

Once it recognized all text, go to "Enhance scan"> "Recognise Text"> "Correct Recognize Text".

It will show you all the words in red boxes where Acrobat has any doubt. Now in 3rd level bar you can correct these words.

Also, there is a checkbox "Review Recognize Text", which will show you what all recognized by Acrobat.

I hope it will resolve your issue. Please feel free to ask anything you want.

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Feb 07, 2017 Feb 07, 2017

Copy link to clipboard

Copied

LATEST

Thank you Lovekesh. The document is too long for me to do all these corrections by hand, maybe when I retire from clmiate campaigning (like never, it will always be more urgent than the year before).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines