I have a need to process (extract) groups of pages (bank statements in this case) from a single PDF with more than 2,500 pages. Apparently they were merged and I need to un-merge. I think all of the statements are: 8, 10 or 12 pages in length and they are PDF searchable (some docs apparently scanned by an iX500). This is the dynamic part—varying document lengths to be extracted. They came from Chase bank via subpoena (I assume all of the statements, 99% of the document, are electronic in origin and that no one actually printed 5 seems of paper). If they were already grouped together with the same number of pages per report (statement), I could process each group with Adobe Acrobat DC and automatically name the files via an existing Hazel method. So, might there be a way to do this...? Other ideas assuming the original pre-merged files are unavailable? What about reading the file somehow and seeing the headers from the original merge and that leads to a bookmark that can be used to extract or undergo the documents--yes/no?
It might be possible, if there is a way to identify where each document begins or ends. It would require a custom-made tool, either a script or a stand-alone application. However, it's impossible to say for sure whether or not it's going to be possible without seeing an actual sample file. I understand this file might contain sensitive information so if you don't want to share it publicly I can have a look at it privately and let you know whether I think it's doable or not, and if so how much it will cost to develop this tool.
You can contact me privately at try6767 at gmail.com to discuss it further.