I did draft a complete answer to this but seems like a application freeze wiped it out before I hit Add Reply.
Even the Adobe telephone support said you can't edit a scanned document. Pointed out it was possibly over ten years ago. Checked his notes and it is.
In Tools you need to select "Enhance Scan" tool. Then select any item in the "Recognise Text" dropdown menu in the 2nd level toolbar ("This File" for eg) then click on the "Recognise Text" button on the third level toolbar which appears.
Did a pretty good job on that document but some last glyphs on words were occasionally left off. 99% of words scan I imagine. Being a 345pp document and but one of eight such I imagine it would be very burdensome to complete the document by hand. Also the facility to upload a Pali language dictionary or English + Pali to handle all the words with (unusual to English speakers) diacritics like “Āloka” might help it. Not sure if that's possible even by hacking the dictionary files for Acrobat. Will ask separately.
Steps you followed are correct to run OCR.
Can you please share 2 things for better understanding the issue.
- Acrobat version you are using
- Can you please take 1 single page out of any PDF you are using. and share the exact issue you are facing. It will help us to concentrate on the exact issue you have.
you can use https://cloud.acrobat.com/send to share the file.
1 person found this helpful
It does not look like I can download anything without signing up for some service. To be blunt: If you want help with your problem, don't make it hard to actually help you. Most of the people here are doing this in their spare time. If you need help in how you can share a page or two so that we can just download the file, I wrote up some information about how to use Adobe's Document Cloud to share files: Share Documents via Adobe's Document Cloud - KHKonsulting LLC
You may want to take a look at these recent questions, which may answer your question as well:
Hi Karl, not sure what you are seeing but I and other people I know have download from that page without any of the issues you describe. Just worked™. Didn't occur to me to share upload the file for the reason that it seemed to be readily available to anybody, even people on slow connections in Burma :-).
Thanks for the links. I did get it working, it wasn't perfect, but neither are the source scans.
This question has been answered (by myself, sorry if it took a while for mods to post the answer).
In the interests of the requests for a small file uploaded to Adobe cloud here's pp1-4.
You'll see that ~99% of the glyphs on these pages are correctly recognised as text strings, yet a few (usually at the end of longer words) get omitted. It happens on both English and Pali language words, so I don't think it's a dictionary issue, though maybe it is and I'm not thinking it through carefully enough.
I get a pop-up block in Chrome on your website link too, Karl. I just copied the link and pasted it into the browser.
1 person found this helpful
Hi, I have gone through the PDF you shared. As you said, most of the words recognized are good. But there is issue in few of them.
You are getting this problem, because no Text recognition tool can recognize 100% text for all the documents. And the problem here is this document is not of good resolution + colored background + have shadows of text written on reverse page.
For this kind of things, we have a functionality(Suspect Correction) where you can correct the text if something is not recognized.
In Tools you need to select "Enhance Scan" tool. Then "Recognise Text" dropdown menu and click "In This File" option. Now click on settings and select "Searchable Image Exact" and then click on the "Recognise Text" button on the third level toolbar which appears.
Once it recognized all text, go to "Enhance scan"> "Recognise Text"> "Correct Recognize Text".
It will show you all the words in red boxes where Acrobat has any doubt. Now in 3rd level bar you can correct these words.
Also, there is a checkbox "Review Recognize Text", which will show you what all recognized by Acrobat.
I hope it will resolve your issue. Please feel free to ask anything you want.
Thank you Lovekesh. The document is too long for me to do all these corrections by hand, maybe when I retire from clmiate campaigning (like never, it will always be more urgent than the year before).