We have been supplied a number of PDFs, each >10,000 pages. Each PDF is a series of individual documents of varying page length (i.e. some have 4 pages, some 6, some 5 etc). We have tried a number of different splitting techniques, none with any success. Our only guide to which pages belong to which document is a text file while sets out in each successive line (a) the 'Page 1' for each new document and (b) the number of pages in that document. For example, line 1 of the text file reads page 00001, 4 pages. Line 2 reads page 00005, 6 pages.
Is anyone aware of a script we can run, either JS or SQL, that can open the main PDF, refer to page number (a) from the text list (imported into a reference table), go to page (a) and split off the next (b) pages into a separate PDF, and repeat through the whole PDF until all sub-documents have been extracted.
Thanks much for your help.
This is certainly possible to do using a script or a stand-alone application. I've developed similar tools in the past so if you're interested in hiring someone to create it for you, feel free to contact me privately at try6767 at gmail.com.