I have some large PDF files containing multiple patients data - each patient could have 1 page or many pages of report. I need to create 1 file for each patient where the filename is a combination of patient name + dob (both items are at the top of each page). The PDF files were not created with a form so the data has no fieldnames.
The manual process is to browse the file, figure out how many pages belong to each patient, copy the text containing the name (first item on 3rd line), print to the PDF printer selecting the from-to page numbers. When the file name box appears, copy the name after the path then manually key the dob and save. There has got to be a better way! There are a total of 9000 patients...
The top of each file looks like this:
Date Printed: 10/18/11
BOWEN, JACOB 0000000523 Sex: M Age: 11 years 4 months
It is not possible in Reader (understandable), but it is possible in Acrobat Pro.
It essentially boils down to analyzing the document, find pages where you have the Chart Summary block, and then extract pages and save.
The very first question is whether the documents are raster graphics. If so, the first would have to undergo OCR.
Analyzing the document means looking for words, using the getPageNthWord() (for finding the "words" on the page), and getPageNthWordQuads() (for evaluating the position on the page. Depending on how the document was created, you may need to so some estimations whether the found word is where it would be significant.
When you have found such a block on the page, you know the starting page, and you have the information to assemble your file name.
One more thing to keep in mind: you may have to create a Trusted Function to save the extracted pages to the specific name.
Hope this can help.
Thank you, that does help. I didn't know about the getPageNthWord() function.
Also thank you to Try67, who has solved my problem.