Action Wizard fails to convert multiple pdf to docx

Report · Jan 15, 2018

Hello,

I am using the trial version of Adobe Acrobat Pro DC (version 18.009.20050).

I want to convert a few hundreds of pdfs to Word documents (docx).

Although this works fine on a single pdf document (using the PDF export function), using the batch processing (Action Wizard) generates damaged and unreadable docx files. Word complains that it can't open them.

Could you please let me know how I could fix this problem?

Thanks and regards,

Jerome

Report · Jan 15, 2018

You will need Acrobat Pro. You then create a new action and you only need to set the output options to save the PDF as a Word document.

This will not improve the conversion process, so any file that cannot converted by your current process.

Report · Jan 15, 2018

I am using the professional version of Acrobat.

The trial version seems to offer all functionalities of Acrobat Pro but only for a limited period of time.

I did create a new action in the Action Wizard.

I specified the folder containing all the pdfs and the export format (i.e. docx).

I then clicked on the start button and Acrobat processed all the pdfs.

The issue is that all docx files generated this way cannot be open with Word.

Yet if I open any of these pdfs in Acrobat and use the export function to convert it to a docx, Acrobat creates a docx file that Word can open.

Report · Jan 15, 2018

What happens when you open the files in Word?

Report · Jan 15, 2018

When I try to open a docx file I receive 3 error messages from Word.

My version of Word is unfortunately in German so I provide below the original messages as well as a translation (via Google)

Message 1:

"Die Datei '[filename.docx]' kann nicht geöffnet werden, da ihr Inhalt Probleme verursacht."

Details: "Die Datei ist beschädigt und kann nicht geöffnet werden."

Translation:

"The file '[filename.docx]' cannot be opened because its content is causing problems."

Details: "The file is corrupted and cannot be opened."

Then I click on 'OK'

Message 2:

"Von Word wurde nicht lesbarer Inhalt in [filename.docx] gefunden. Möchten Sie den Inhalt des Dokumentes wiederherstellen? Klicken Sie auf 'Ja', wenn Sie der Dokumentquelle vertrauen."

Translation:

"Word found unreadable content in [filename.docx]. Do you want to restore the content of the document? Click 'Yes' if you trust the document source."

Then I click on 'Ja'

Message 3:

"Die Datei '[filename.docx]' kann nicht geöffnet werden, da ihr Inhalt Probleme verursacht."

Details: "Die Datei kann in Microsoft Office nicht geöffnet werden, weil Teile fehlen oder ungültig sind."

Translation:

"The file '[filename.docx]' cannot be opened because its content is causing problems."

Details: "The file cannot be opened in Microsoft Office because parts are missing or invalid."

At that point cliking on OK terminates Word.

Report · Jan 16, 2018

Can you share a sample file?

Report · Jan 16, 2018

I opened the original pdf, and the 2 Word documents resulting from the conversion, one using the pdf-data export function and the other using action wizard in a text editor (see below). It seems that Action Wizard converts the pdf into another pdf instead of a Word document (the Word document generated with the pdf-data export function contains binary information only).

Any idea why?

Should you need the entire files, let me know what would be the best way to share them with you.

Original pdf:

%PDF-1.4

%âãÏÓ

2 0 obj

<</Rect[71.9 600.55 116.81 607.95]/Subtype/Link/A<</S/URI/URI(https://doi.org/10.1088/1748-9326/aa9281)>>/Border[0 0 0]/P 3 0 R>>

endobj

4 0 obj

Word document generated with Action Wizard:

%PDF-1.4

%âãÏÓ

498 0 obj

<</Metadata 23 0 R/Names 499 0 R/OpenAction[495 0 R/Fit]/PageLabels 496 0 R/PageMode/UseOutlines/Pages 42 0 R/PieceInfo<</SearchIndex<</Index1File 569 0 R/IndexFile 570 0 R/ModID(f0c5cb65103f0f41901e03b6cd3aa378)/PDXFile 571 0 R>>>>/Type/Catalog>>

endobj

23 0 obj

<</Length 3982/Subtype/XML/Type/Metadata>>stream

<?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?>

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30 ">

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about=""

xmlns:xmp="http://ns.adobe.com/xap/1.0/"

xmlns:dc="http://purl.org/dc/elements/1.1/"

xmlns:pdf="http://ns.adobe.com/pdf/1.3/"

xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">

<xmp:CreatorTool>LaTeX with hyperref package</xmp:CreatorTool>

<xmp:ModifyDate>2018-01-11T18:12:34+01:00</xmp:ModifyDate>

<xmp:CreateDate>2017-11-03T16:11:45+05:30</xmp:CreateDate>

<xmp:MetadataDate>2018-01-11T18:12:34+01:00</xmp:MetadataDate>

<dc:format>application/pdf</dc:format>

<dc:title>

<rdf:Alt>

<rdf:li xml:lang="x-default">Synergies and trade-offs between energy-efficient urbanization and health</rdf:li>

</rdf:Alt>

</dc:title>

<dc:creator>

<rdf:Seq>

<rdf:li>S Ahmad et al</rdf:li>

</rdf:Seq>

</dc:creator>

<dc:description>

<rdf:Alt>

<rdf:li xml:lang="x-default">Environmental Research Letters, 12 (2017) 1â€“10. doi: 10.1088/1748-9326/aa9281</rdf:li>

</rdf:Alt>

</dc:description>

<pdf:Keywords>sustainable development, morbidity, environmental health transition, India, energy-efficient urbanization</pdf:Keywords>

<xmpMM:DocumentID>uuid:72b9ef69-0582-4019-987a-bc4dbefcfc4c</xmpMM:DocumentID>

<xmpMM:InstanceID>uuid:531b420a-1b89-42d8-a103-f1188611a735</xmpMM:InstanceID>

</rdf:Description>

</rdf:RDF>

</x:xmpmeta>

Report · Jan 16, 2018

You can share the PDF file using Tools > Send & Track

The generated file is also a PDF file.

Report · Jan 16, 2018

The 2 pdf files are available at this address:

https://files.acrobat.com/a/preview/7335736b-dbc5-414e-ad64-c5a77743fd86

Report · Jan 16, 2018

I changed the extension of the docx file to pdf and indeed I can open and read this file properly with Acrobat

Report · Jan 16, 2018

What settings do you use in the save step of the action?

Report · Jan 16, 2018

I include a "save file" action with the export format set to Word document (see image below, hope it is readable).

I also get the same problem when using the Word 97-2003 format.

Report · Jan 16, 2018

This settings are correct.

In your screenshot I can see at the right side "Convert PDF to docx and xml". On the left side I can see only "Convert PDF to docx". Is this correct?

Report · Jan 16, 2018

Yes this is correct.

Originally I wanted to convert pdfs both in docx and xml.

That's why I named one action that way (the one in the background).

After I realised that it was not working properly, I created another action converting in docx only (the one you see in the foreground).

I also created an action that converts pdfs to doc files (as mentioned previously)

Report · Jan 16, 2018

When I use the action it creates Word files.

Report · Jan 16, 2018

Good to know that it works as it is supposed to do on your machine.

I wonder why I get this problem on mine...

Could it be a problem with the trial version?

Or is it rather a problem with my configuration?

My laptop is a HP business-notebook with 4 i5-3360M Intel cores and 8GB memory running under Windows 10 Pro

I saw in the text editor that Acrobat seems to be using LaTex to convert pdf to docx.

<xmp:CreatorTool>LaTeX with hyperref package</xmp:CreatorTool>

I had already install LaTEx via the MikTex package prior to installing Adobe Acrobat.

Could that be the problem?

Report · Jan 16, 2018

What puzzles me is that it works fine when converting a single pdf document...

Report · Jan 16, 2018

I just restarted my computer and it did not help.

Also please forget what I said about the LaTex. This information was simply taken from the original pdf.

Report · Jan 16, 2018

Something to try: Instead of using the built-in "Save As" command, try executing this JavaScript code:

this.saveAs({cPath: this.path.replace(/\.pdf$/i, ".docx"), cConvID: "com.adobe.acrobat.docx"});

Report · Jan 16, 2018

Repair the Acrobat installation.

The PDF file was created with Latex. Adobe Acrobat doesn't use this.

Report · Jan 16, 2018

Thanks a lot for your help Bernd.

Reparing the Acrobat installation seems to have fixed the problem.

Adobe Community

Action Wizard fails to convert multiple pdf to docx

1 Correct answer