This content has been marked as final. Show 11 replies
>Would anybody of you kindly help me on this subject with an easy step-by-step instruction?
write this program? This may not be the best use of your time.
>Because I am preparing exams I don't have a lot of spare time.
So you want someone to do this for you?
Thank you for your replies!
George, may I ask you to kindly provide the step by step instructions for your tip? I opened Acrobat and searched for Java in the help section, but did not find anything.
To me, Java is an Island somewhere in the Indian Ocean...
Don't search for Java: you won't be using Java. You need to use
isn't "Java Script".
Programming information is not included in the help file. Developers
need to get the Acrobat SDK, which has lots of detail. You don't need
the whole SDK, you can get started with the documents on here:
Thank you Aandi for your answer!
I hope you don't regret having helped me. Please understand that as a student I am always trying to save time and money (well, I guess everybody is, but when you don't have a salary it gets more extreme).
Contact me by email. I can help you out with this.
I have a few hundred PDF's that I need to save as text. These are PDFs which have been OCRd - meaning that I can currently open each one up manually, select "copy", and then "paste" the contents into a text editor. However, when I try to set up a batch to have this happen automatically, it doesn't work.
I'm sure there is an easy way to do this, but I cannot figure it out. Any help would be much appreciated!
If you goto the Acorbat Developers site, http://www.adobe.com/devnet/acrobat/ , and download the the batch seqences files, http://www.adobe.com/devnet/acrobat/pdfs/batch_sequences.pdf and http://www.adobe.com/devnet/acrobat/downloads/batchseq.zip , there is a batch sequence to create a PDF report of the bookmarks in a PDF or PDFs which could then be saved as a text file.
Thanks very much for your reply.
The pdf is an image of an old typed page and comes already OCR'd. If I open up the PDF in Acrobat Pro and "Export" to a text file, the text file comes out empty (same for exporting to word format)... Only if I manually "select all" and "copy" and "paste" the text into a text editor can I get what I want. Apparently the text that I "copy" and "paste" is stored in a hidden "layer"(?) that the export function doesn't have access to. I am trying to figure out how to access this "layer" through a command that can be part of a batch script.
If you're on a Mac, you can try pdftotext:
You can use it in Terminal, or in applescript. I use the following applescript saved as an application to batch multiple files:
on open theFiles
repeat with pdfFile in (theFiles as list)
set thePath to POSIX path of (pdfFile)
set commando to "/usr/local/bin/pdftotext -layout " & quoted form of thePath
do shell script commando
It has worked with the PDFs I have used, don't know if it will work with your OCR'd PDFs.