I have been tasked to update a perl script which does one thing: determine the page count of thousands of pdf documents daily. We are passing pdfs from our system to an external application that needs to know the page count.
The perl script was written several years ago and works with pdf versions that have a non-encrypted page tree object, but that cant be counted on above PDF-1.4. All of the Windows desktops in our enterprise are running Adobe 9 with the option unchangeably set for optimize for fast web view. That creates a Linearized pdf if saved on that version of Adobe.
I have added a text search for the Linearized object (which is always on the second line of the pdf and always contains the page count), but there are still many pdfs being scanned on a wide variety of scanners which do not have a visible page tree object, nor are they linearized.
I can write a unix shell script to email the errant pdfs to a Windows desktop. What I dont know is whether it is possible to create an interface that will load the pdfs into Adobe for batch processing (involving opening and resaving to linearize the document). This would solve the problem of pdfs with no searchable page count text.