Copy link to clipboard
Copied
Hello,
I am using the trial version of Adobe Acrobat Pro DC (version 18.009.20050).
I want to convert a few hundreds of pdfs to Word documents (docx).
Although this works fine on a single pdf document (using the PDF export function), using the batch processing (Action Wizard) generates damaged and unreadable docx files. Word complains that it can't open them.
Could you please let me know how I could fix this problem?
Thanks and regards,
Jerome
Repair the Acrobat installation.
The PDF file was created with Latex. Adobe Acrobat doesn't use this.
Copy link to clipboard
Copied
You will need Acrobat Pro. You then create a new action and you only need to set the output options to save the PDF as a Word document.
This will not improve the conversion process, so any file that cannot converted by your current process.
Copy link to clipboard
Copied
I am using the professional version of Acrobat.
The trial version seems to offer all functionalities of Acrobat Pro but only for a limited period of time.
I did create a new action in the Action Wizard.
I specified the folder containing all the pdfs and the export format (i.e. docx).
I then clicked on the start button and Acrobat processed all the pdfs.
The issue is that all docx files generated this way cannot be open with Word.
Yet if I open any of these pdfs in Acrobat and use the export function to convert it to a docx, Acrobat creates a docx file that Word can open.
Copy link to clipboard
Copied
What happens when you open the files in Word?
Copy link to clipboard
Copied
When I try to open a docx file I receive 3 error messages from Word.
My version of Word is unfortunately in German so I provide below the original messages as well as a translation (via Google)
Message 1:
"Die Datei '[filename.docx]' kann nicht geöffnet werden, da ihr Inhalt Probleme verursacht."
Details: "Die Datei ist beschädigt und kann nicht geöffnet werden."
Translation:
"The file '[filename.docx]' cannot be opened because its content is causing problems."
Details: "The file is corrupted and cannot be opened."
Then I click on 'OK'
Message 2:
"Von Word wurde nicht lesbarer Inhalt in [filename.docx] gefunden. Möchten Sie den Inhalt des Dokumentes wiederherstellen? Klicken Sie auf 'Ja', wenn Sie der Dokumentquelle vertrauen."
Translation:
"Word found unreadable content in [filename.docx]. Do you want to restore the content of the document? Click 'Yes' if you trust the document source."
Then I click on 'Ja'
Message 3:
"Die Datei '[filename.docx]' kann nicht geöffnet werden, da ihr Inhalt Probleme verursacht."
Details: "Die Datei kann in Microsoft Office nicht geöffnet werden, weil Teile fehlen oder ungültig sind."
Translation:
"The file '[filename.docx]' cannot be opened because its content is causing problems."
Details: "The file cannot be opened in Microsoft Office because parts are missing or invalid."
At that point cliking on OK terminates Word.
Copy link to clipboard
Copied
Can you share a sample file?
Copy link to clipboard
Copied
I opened the original pdf, and the 2 Word documents resulting from the conversion, one using the pdf-data export function and the other using action wizard in a text editor (see below). It seems that Action Wizard converts the pdf into another pdf instead of a Word document (the Word document generated with the pdf-data export function contains binary information only).
Any idea why?
Should you need the entire files, let me know what would be the best way to share them with you.
Original pdf:
%PDF-1.4
%âãÏÓ
2 0 obj
<</Rect[71.9 600.55 116.81 607.95]/Subtype/Link/A<</S/URI/URI(https://doi.org/10.1088/1748-9326/aa9281)>>/Border[0 0 0]/P 3 0 R>>
endobj
4 0 obj
Word document generated with Action Wizard:
%PDF-1.4
%âãÏÓ
498 0 obj
<</Metadata 23 0 R/Names 499 0 R/OpenAction[495 0 R/Fit]/PageLabels 496 0 R/PageMode/UseOutlines/Pages 42 0 R/PieceInfo<</SearchIndex<</Index1File 569 0 R/IndexFile 570 0 R/ModID(f0c5cb65103f0f41901e03b6cd3aa378)/PDXFile 571 0 R>>>>/Type/Catalog>>
endobj
23 0 obj
<</Length 3982/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<xmp:CreatorTool>LaTeX with hyperref package</xmp:CreatorTool>
<xmp:ModifyDate>2018-01-11T18:12:34+01:00</xmp:ModifyDate>
<xmp:CreateDate>2017-11-03T16:11:45+05:30</xmp:CreateDate>
<xmp:MetadataDate>2018-01-11T18:12:34+01:00</xmp:MetadataDate>
<dc:format>application/pdf</dc:format>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Synergies and trade-offs between energy-efficient urbanization and health</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>S Ahmad et al</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">Environmental Research Letters, 12 (2017) 1–10. doi: 10.1088/1748-9326/aa9281</rdf:li>
</rdf:Alt>
</dc:description>
<pdf:Producer>Acrobat Distiller 8.1.0 (Windows); modified using iText® 5.5.10 ©2000-2015 iText Group NV (AGPL-version)</pdf:Producer>
<pdf:Keywords>sustainable development, morbidity, environmental health transition, India, energy-efficient urbanization</pdf:Keywords>
<xmpMM:DocumentID>uuid:72b9ef69-0582-4019-987a-bc4dbefcfc4c</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:531b420a-1b89-42d8-a103-f1188611a735</xmpMM:InstanceID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
Copy link to clipboard
Copied
You can share the PDF file using Tools > Send & Track
The generated file is also a PDF file.
Copy link to clipboard
Copied
The 2 pdf files are available at this address:
https://files.acrobat.com/a/preview/7335736b-dbc5-414e-ad64-c5a77743fd86
Copy link to clipboard
Copied
I changed the extension of the docx file to pdf and indeed I can open and read this file properly with Acrobat
Copy link to clipboard
Copied
What settings do you use in the save step of the action?
Copy link to clipboard
Copied
I include a "save file" action with the export format set to Word document (see image below, hope it is readable).
I also get the same problem when using the Word 97-2003 format.
Copy link to clipboard
Copied
This settings are correct.
In your screenshot I can see at the right side "Convert PDF to docx and xml". On the left side I can see only "Convert PDF to docx". Is this correct?
Copy link to clipboard
Copied
Yes this is correct.
Originally I wanted to convert pdfs both in docx and xml.
That's why I named one action that way (the one in the background).
After I realised that it was not working properly, I created another action converting in docx only (the one you see in the foreground).
I also created an action that converts pdfs to doc files (as mentioned previously)
Copy link to clipboard
Copied
When I use the action it creates Word files.
Copy link to clipboard
Copied
Good to know that it works as it is supposed to do on your machine.
I wonder why I get this problem on mine...
Could it be a problem with the trial version?
Or is it rather a problem with my configuration?
My laptop is a HP business-notebook with 4 i5-3360M Intel cores and 8GB memory running under Windows 10 Pro
I saw in the text editor that Acrobat seems to be using LaTex to convert pdf to docx.
<xmp:CreatorTool>LaTeX with hyperref package</xmp:CreatorTool>
I had already install LaTEx via the MikTex package prior to installing Adobe Acrobat.
Could that be the problem?
Copy link to clipboard
Copied
What puzzles me is that it works fine when converting a single pdf document...
Copy link to clipboard
Copied
I just restarted my computer and it did not help.
Also please forget what I said about the LaTex. This information was simply taken from the original pdf.
Copy link to clipboard
Copied
Something to try: Instead of using the built-in "Save As" command, try executing this JavaScript code:
this.saveAs({cPath: this.path.replace(/\.pdf$/i, ".docx"), cConvID: "com.adobe.acrobat.docx"});
Copy link to clipboard
Copied
Repair the Acrobat installation.
The PDF file was created with Latex. Adobe Acrobat doesn't use this.
Copy link to clipboard
Copied
Thanks a lot for your help Bernd.
Reparing the Acrobat installation seems to have fixed the problem.