13 Replies Latest reply on Nov 14, 2017 11:36 AM by Thom Parker

    Extract Pages from comma delimited list and

    john xavierp31930688

      I am working with really large (numbers of pages) pdfs.  I know of two ways to extract non-sequential pages: right-click from either the page view OR the organize pages view.

       

      The problem is that I want to extract many different non-sequential pages (could be 200 pages out of 1100).  I keep a list of all pages indexed in EXCEL (don't ask...that's just what we are doin right now ) .  Anyway, as part of that process I use EXCEL to identify which pages will be extracted (and sent to third parties) from the larger *.pdf file.  Because of that, I can easily using EXCEL VBA create a comma delimited list of which pages to extract from the *.pdf file.

       

      My problem is I am a complete novice at Javascript.  So right now, I have to open up Adobe and hold the CNTRL key while I scroll thru and try and select dozens (if not hundreds) of non-sequential page for extraction.  One slip and I have to start over again.

       

      I just want to be able to feed (could be copy-paste) a list of comma-delimited page number to Adobe and have it extract those pages to one *.pdf.   My thought is that a javascript in Adobe would have a variable assigned to the pages to be extracted, say N=1,2,4,54,23,45,198,543.  So that using excel, I could just copy paste my VBA comma-delimited string of pages to extract output right in for "N" in the Adobe javascipt and then run the javascript.

       

      Can anyone help me?

        • 1. Re: Extract Pages from comma delimited list and
          try67 MVP & Adobe Community Professional

          You can do it like this:

          - Create a new file (app.newDoc)

          - Import the pages from the old files into the new one (insertPages method of the Document object)

          - Delete the first, empty page of the new file (deletePages method)

          - Save the new file under a new name (saveAs method)

           

          If you're interested, I've already developed a tool that allows you to do it easily by entering the page numbers into a dialog window.

          You can even use ranges, so instead of writing "2,3,4,5,6,10", you can enter "2-6,10" and it will pick up the correct pages automatically.

          Of course, the pages don't have to be in sequential order, so it can also be "10, 2-6", if you wish.

          You can find it here: Custom-made Adobe Scripts: Acrobat -- Extract Non-Sequential Pages

          • 2. Re: Extract Pages from comma delimited list and
            Thom Parker Adobe Community Professional

            This question comes up regularly. I've always thought it odd that Adobe didn't provide better page extraction options. Later tools, such as "redaction" take a page list/range as input, but they never updated the older tools that also need pages as input.

             

            So the only option is to write an automation script to do it.

             

            If you just go with the list (not page ranges), and assume the page numbers are 0 based, then this script will work

             

            var strPgs = "2,3,4,5,6,10";

            var aPages = strPgs.split(",");

            var oNewDoc = app.newDoc();

            var oDoc = this;

            aPages.forEach(function(nPg){oNewDoc.insertPages(oNewDoc.numPages-1,oDoc.path,nPg);});

             

            Run it from the Console Window

            Also a good idea for a tool at pdfscripting.com

            1 person found this helpful
            • 3. Re: Extract Pages from comma delimited list and
              john xavierp31930688 Level 1

              That is what I'm looking for, Thom!  Only problem I had is that the pages extracted (when I run your code exactly as above to test) are shifted by one.

               

              So instead of extracting the 6 pages consisting of pages 2,3,4,5,6 and 10 in your example above, what actually was extracted was 7 pages consisting of one blank page + pages 3,4,5,6,7 and 11?

               

              Any thoughts?

              • 4. Re: Extract Pages from comma delimited list and
                Karl Heinz Kremer Adobe Community Professional

                Acrobat starts to count pages with 0 - so the first page in a document is page 0, the second page is page 1 and so on. This is standard behavior for anybody with a software engineering background, and needs some getting used to for somebody who is not a software engineer

                1 person found this helpful
                • 5. Re: Extract Pages from comma delimited list and
                  try67 MVP & Adobe Community Professional

                  The issue here is not the fact that pages in JS are zero-based, actually.

                  When you create a new document it automatically adds a single, blank page to it (as a PDF file can't have zero pages). You're then adding the extracted pages after that one page. So at the end of the process it needs to be removed.

                  To do that add this command at the end of the code Thom provided:

                  oNewDoc.deletePages();

                  • 6. Re: Extract Pages from comma delimited list and
                    try67 MVP & Adobe Community Professional

                    Sorry, it's actually a combination of both. As Thom pointed out, the numbers you specify need to be zero-based, or you need to adjust the code, like this:

                     

                    var strPgs = "2,3,4,5,6,10"; // 1-based page numbers!
                    var aPages = strPgs.split(",");
                    var oNewDoc = app.newDoc();
                    var oDoc = this;
                    aPages.forEach(function(nPg){oNewDoc.insertPages(oNewDoc.numPages-1,oDoc.path,nPg-1);});
                    oNewDoc.deletePages();
                    
                    • 7. Re: Extract Pages from comma delimited list and
                      john xavierp31930688 Level 1

                      Excellent!

                       

                      I have another tool that I use in ACTION WIZARD to create an excel sheet of my *.pdf bookmarks which I manipulate in excel using VBA to identify all the various pages in the related *.pdf to extract.  Now I can easily extract out (tried it out on a file with 100+ non-sequential pages) pages using this little script.

                       

                      Anyway to put this little script (just like you coded it above) into something that I can run from the ACTION WIZARD so I don't have to bring up the Console and then remember to highlight the script and then press CNTRL-ENTER to run?

                      • 8. Re: Extract Pages from comma delimited list and
                        try67 MVP & Adobe Community Professional

                        And how would you tell the Action which pages to extract for each of the files it is processing?

                        • 9. Re: Extract Pages from comma delimited list and
                          john xavierp31930688 Level 1

                          My Excel VBA outputs a comma-delimited list in one cell.  To test the script I just copy-pasted into the script between the quotation marks (var strPgs = "2,3,4,5,6,10"; // 1-based page numbers!)

                           

                          So, my thought would be an Adobe dialogue box would pop up and I just copy paste the comma-delimited list from Excel into the dialogue box and press enter.  The list would be inserted between the quotation marks and then execute.

                           

                          Does that not work?

                          • 10. Re: Extract Pages from comma delimited list and
                            Thom Parker Adobe Community Professional

                            The simple solution is to use the  "app.response()" fucntion. It returns a string.

                             

                            If you are using Acrobat DC, then use a command with this script.

                             

                             

                            var strPgs = app.response("Enter list of page numbers");
                            var aPages = strPgs.split(","); 
                            var oNewDoc = app.newDoc(); 
                            var oDoc = this
                            aPages.forEach(function(nPg){oNewDoc.insertPages(oNewDoc.numPages-1,oDoc.path,nPg-1);}); 
                            oNewDoc.deletePages(); 

                            • 11. Re: Extract Pages from comma delimited list and
                              try67 MVP & Adobe Community Professional

                              I have a suspicion it might not work in an Action, because of the new file being created.

                              You might need to add a command to close the new document at the end of the code.

                              • 12. Re: Extract Pages from comma delimited list and
                                Thom Parker Adobe Community Professional

                                It's not for use in an Action, but rather in a "Custom Command".  If you look on the Acrobat DC "Action Wizard" you'll see buttons for adding a "Custom Command". This is basically just an easy way to create and share Automation Scripts.

                                • 13. Re: Extract Pages from comma delimited list and
                                  Thom Parker Adobe Community Professional

                                  In the case of an Action, the new doc doesn't need to be closed, but it would be helpful to do so. For this you'd also need a file naming convention.