I have a set of over 100,000 PDF documents and am looking to identify any of these documents that contain a set of four numbers. Preforably I would like to export such a list in a format in which I can link to each document - ideally an xls Excel document. Any advice on exporting an index of a search in Adobe or other proposed options would be very helpful. Thanks in advance.
You will need Acrobat X or higher to accomplish this, but here is how I would perform that task:
(1) Create a full text index of the document set (and I'm assuming your 100K files all contain searchable text content)
(2) Run your search against the index via Acrobat's advanced search feature
(3) Save the results to a .CSV file, which was a long-overdue feature that arrived in Acrobat X
(4) When you open the .CSV in Excel (you may want to Save As .xls(x) for greater flexibility), you will have a list of file names, then a sublist of the page where the search hit occurred
(5) From here you'll need to manipulate the data in the Excel sheet to list complete file paths for each row, at which point you should then be able to open the corresponding file right from your Excel sheet.
Now making the Excel sheet open to the exact PAGE is a bit more complex, but is also possible with some VB scripting. You can search for that on the web if you want some sample code.
FYI, I approach this concept a little bit differently in my book, where I use a simple VB script in lieu of file paths, which I say only to point out that there are several different options for setting up the links once you're in Excel.
Hope that helps!
PDF Litigation Solutions, LLC