I need to re-OCR some texts that have a mix of oriental and western scripts. Acrobat only allows me to choose one language at a time, and too much is lost in the process. So I want to undo the Acrobat OCR from a PDF. How do I do this?
I found a web page showing how to export it to Microsoft XMS and then remake the PDF, but the resultant PDF looks very different from the original.
When you OCR an image in Acrobat, the text is placed on an invisible layers in the PDF. To see this text, open the Content Navigation pane. You can select and delete the text in the Content navigation pane that you need to re-OCR -- this does not affect the image on the page. Then try re-running the Recognize Text command with the new primary OCR language selected that you would like.
That is true as long as you do not use ClearScan that does not put the text behind the image. With clearscan it is hard to go back (good reason to work on copies). One thing you can do if what you see on screen is still what you want, just save as a TIFF file. The save will create a TIFF file for every page of the PDF. Those can be reloaded into Acrobat as images and you can start your OCR over. This time, try to save a copy first.