2 Replies Latest reply on May 17, 2017 10:56 AM by troelsk

    Batch OCR first pages of pdfs to spreadsheet

    troelsk

      Hi!

       

      I've got around 400 pdf's of varying length that I'd like to batch convert to a spreadsheet. The best case scenario would be this: Get the content of the first page of each pdf and get the text into a spreadsheet, each entry inside it's own single cell. Do you have any ideas on how to do this?

       

      Here's my own idea so far:

       

      1) Delete all pages except the first one in a batch. I've the tried the action wizard, but it seems to only work if all the pdf's have the same amount of pages - which they don't. Is there any way to overcome this?

       

      2) Batch convert pdfs to xml. This I can do, and it seems to do a quite good job at the OCR. However, the text is spread out on multiple cells in the spreadsheet. Is there any way to tell Acrobat to put all the information in a single cell?

       

      3) Merge the xml-documents into a single spreadsheet. This should be fairly simple, I think, so no worries on that one.

       

      Any help with the two steps? Or others ideas on how to achieve this?

       

      Thank you :-)