2 Replies Latest reply on Oct 23, 2011 6:45 AM by CMiller505

    New user looking for any possible solution...


      I have some large PDF files containing multiple patients data - each patient could have 1 page or many pages of report.  I need to create 1 file for each patient where the filename is a combination of patient name + dob (both items are at the top of each page).  The PDF files were not created with a form so the data has no fieldnames.


      The manual process is to browse the file, figure out how many pages belong to each patient, copy the text containing the name (first item on 3rd line), print to the PDF printer selecting the from-to page numbers. When the file name box appears, copy the name after the path then manually key the dob and save.  There has got to be a better way!  There are a total of 9000 patients...


      I've found javascript information about saveAs and print functions.  The problem is the filename.  Is there a string or substring function that will locate/save the name and dob so I can create the filename?  Even if it created 1 file per page - if it were named properly, I could go back and merge the multiple files together.


      The top of each file looks like this:


      Chart Summary  

                                                                                   Date Printed: 10/18/11

      BOWEN, JACOB                     0000000523 Sex: M Age: 11 years 4 months

                                                                                           DOB: 05/26/2000


      Is this even possible within Acrobat or Adobe Reader using Javascript?  Is there any other option?

        • 1. Re: New user looking for any possible solution...
          maxwyss Level 4

          It is not possible in Reader (understandable), but it is possible in Acrobat Pro.


          It essentially boils down to analyzing the document, find pages where you have the Chart Summary block, and then extract pages and save.


          The very first question is whether the documents are raster graphics. If so, the first would have to undergo OCR.


          Analyzing the document means looking for words, using the getPageNthWord() (for finding the "words" on the page), and getPageNthWordQuads() (for evaluating the position on the page. Depending on how the document was created, you may need to so some estimations whether the found word is where it would be significant.


          When you have found such a block on the page, you know the starting page, and you have the information to assemble your file name.


          One more thing to keep in mind: you may have to create a Trusted Function to save the extracted pages to the specific name.


          For details refer to the Acrobat JavaScript documentation.


          Hope this can help.


          Max Wyss.

          1 person found this helpful
          • 2. Re: New user looking for any possible solution...
            CMiller505 Level 1

            Thank you, that does help.  I didn't know about the getPageNthWord() function.


            Also thank you to Try67, who has solved my problem.