I have a book-sized collection of PDF documents that are "text as image" (scanned). I am trying to convert them to "text text" so that one can open the resulting PDF document with Reader and have "proper text" in them.
So far, I have found how to edit the text in document, but that only seems to apply to the page I'm viewing. If I export the document as "text" to check what has been transformed from image to text, I only get snippets of the document. I have also found out how to make the image text searchable, but that keeps it as "image".
What is the best way to achieve what I'm trying to do – ie, export a document of editable text with decent text flow?
In Acrobat you should use the Recognize Text command and select the "Searchable Image" option. That will keep your images as they are, but will insert an invisible layer of text underneath them (if the OCR process is successful, of course), that you could search and export to other formats.