I have been unable to solve the following issue when converting (save as...) PDF documents to Microsoft Word .doc using numerous methods. This could either be an issue that would be fixed in Acrobat Pro itself, or in MS Word - posting to the Adobe forums first.
PREFACE: I am attempting to use the converted .doc file with translation applications/software. Google Translator Toolkit is what I use the most, but ALL other translators are having this very same issue with the .doc file. --The source PDFs are product information from drug manufacturers in various countries that I need to have translated to English. I do not have access to their source documents, as they do not provide their own source docs for obvious reasons.
ALSO: I cannot use Google Translator toolkit to translate from PDFs directly - if you do that, it will attempt to translate a PDF and then export in an .html file, but it does not get the exact spacing of the sentences correctly, which leads to errors in translating - key things such as "can take with alcohol" and "do not take with alcohol". So that's out!
I am not having any problems with the resultant .doc file in MS Word itself. It looks right, the spacing matches the original PDF source perfectly, prints correctly, etc... Reference here on a product info sheet from Austria in German:
The problem: This is a screenshot from Google Translator Toolkit - the right side of the image - the spacing in the lettering from the .doc file I am uploading is not being read correctly, resulting in untranslated gibberish. (Note: this isn't a problem with the translation applications or software -- all are having this issue with .doc files converted from .pdf - this issue isn't present with any old .doc file that wasn't converted from a .pdf) -- It's definitely got something to do with some kind of embedded data in the .doc file that I cannot isolate!!)
My settings in Adobe Pro (convert from PDF to .doc):
Page layout: Flowing Text (this prevents the resultant .doc from having all of those text boxes, which also don't then work in translators)
Include comments: True
Include images: True
Run OCR if needed: True
-I have run OCR text recognition on the source PDF files in it's specific language.
-I have edited the accessibilty of the PDF and have run the tag recognition and quick checks (to see if they solved the issue, which it did not - tagged or untagged, same problems!)
-I have exported the .doc BACK to PDF using MS Word's function, which results in a great looking tagged PDF. THEN I re-saved this new PDF back as a .doc - same issue.
-I have tried saving the PDF in all of the other formats that the translators accept. All have different issues. The only one that works consistently is saving to a .txt (plain)... The best is a .doc to .doc conversion, with all the original spacing. (I am not spending hours reformatting a .txt translation in word)...
I can't seem to find where this spacing data is in the .doc file!!!! (Changing the fonts, sizes, margins -- doesnt fix this either). I have tried so many methods...
Any thoughts on other things to try in Adobe Pro (or Word)?
EDIT: Here's an additional tidbit of info that may be the key to this... There's some kind of coding that is in the .doc that Adobe Pro converted from the source PDF that doesnt display in Word, but that is being seen by the translation programs....... I have no idea what these are, but I want to remove them!
Message was edited by: KaotikADC