I often get PDF files where the document was scanned in as an image. I'm wasting time having to re-type these. I found a way online to let Google do a conversion but it is slow (link below if you're curious). What are some other options for converting this type of PDF into real text?
http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with -google-ocr/5158/
Well, at first I read your reply and thought .... sheesh, this is in the RTM category. I found the OCR command in the drop-down menu and ran it. I watched it going from page to page and heard a lot of disk activity. Then I started looking for the output. I checked to see if it had dumped the results into the Clipboard, but no. I ran a search of the hard drive for any file created today. Again no apparent output. The help instructions in Acrobat Pro 9 regarding OCR don't really say anything about where to find the output. Where should I look?
It is in the PDF, not some place else. You can then save as a text file (or DOC) and get the text. Apparently you are using the searchable image, but could also use ClearScan that replaces the image of text with the text where it thinks it is successful. The searchable text is found on a different layer in the PDF file.
Yes a searchable image. I think my problem has been that I can't see anything that looks different after the OCR is completed. Do I need to go into one of the drop-down menus after OCR and do the save from a point that is not the default document view, or just do a save as text or doc from the default once the OCR completes?
Found it! The word "layer" that you used did the trick. On impulse, I did OCR on a page I was looking at, and then did a simple test to see if the layer was on top of what I was looking at; I moved my mouse pointer over the PDF text, and it changed into the verticle text-edit symbol. After that it was easy to cut and pastte.
Thank you! I just wasn't getting the concept until your description!
North America
Europe, Middle East and Africa
Asia Pacific