I am using Adobe Reader 9.4.1 in Windows XP and am working with a large list of vocabulary terms, translations from simplified Chinese chracters to English, formatted visually as a table. I am trying to get this into a more easily parsable format. I've tried the pdftotext tool in xpdf with the -layout option, which almost worked, but the Chinese characters are not being extracted.
I also tried the "Save as text" feature in Adobe Reader which seems to work, but the Chinese characters are being displayed as periods when I open the output files in a text editor. I've tried switching to different encodings but am not able to see the characters.
Any help would be greatly appreciated! Please let me know if additional information is needed!
I just tried to save a Japanese PDF as text (using Adobe Reader 10.1.1), and it successfully saved all Japanese characters.
Can you also try with Reader 10 if Chinese characters can be saved as text?
The problem, I think, is that the txt file must be saved with a Unicode encoding, not ANSI. I don't know if Reader 10 does this, or if my operating system (Windows 7) also has a hand in this.