• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Batch OCR first pages of pdfs to spreadsheet

New Here ,
May 11, 2017 May 11, 2017

Copy link to clipboard

Copied

Hi!

I've got around 400 pdf's of varying length that I'd like to batch convert to a spreadsheet. The best case scenario would be this: Get the content of the first page of each pdf and get the text into a spreadsheet, each entry inside it's own single cell. Do you have any ideas on how to do this?

Here's my own idea so far:

1) Delete all pages except the first one in a batch. I've the tried the action wizard, but it seems to only work if all the pdf's have the same amount of pages - which they don't. Is there any way to overcome this?

2) Batch convert pdfs to xml. This I can do, and it seems to do a quite good job at the OCR. However, the text is spread out on multiple cells in the spreadsheet. Is there any way to tell Acrobat to put all the information in a single cell?

3) Merge the xml-documents into a single spreadsheet. This should be fairly simple, I think, so no worries on that one.

Any help with the two steps? Or others ideas on how to achieve this?

Thank you 🙂

TOPICS
Edit and convert PDFs

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , May 11, 2017 May 11, 2017

I think the way to do it is by using a script, like this:

- Use an Action to process all the files.

- Perform OCR on the first page of each file.

- Extract the first page's text and save it into a global variable using a

script.

- When the Action is complete, run a separate script to export the value of

that variable to a text file, which can then be opened using Excel.

Votes

Translate

Translate
Community Expert ,
May 11, 2017 May 11, 2017

Copy link to clipboard

Copied

I think the way to do it is by using a script, like this:

- Use an Action to process all the files.

- Perform OCR on the first page of each file.

- Extract the first page's text and save it into a global variable using a

script.

- When the Action is complete, run a separate script to export the value of

that variable to a text file, which can then be opened using Excel.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 17, 2017 May 17, 2017

Copy link to clipboard

Copied

LATEST

Thank you, try67 - my solution ended up being batch converting to txt-files and then importing in Excel using VBA.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines