I have got some PDF documents that contain text in Russian (Cyrillic letters). I open these PDFs with Adobe Reader, mark the text and copy it to the buffer. When I then paste the text into Word or any other application, there is no Russian, but just special symbols. Note: that the text is displayed correct in the PDF itself.
I use Windows 7 Ultimate, 64 bit, in German. Is there any way to get cyrillic text into the buffer without scrapping it?
Claudio, thank's for your guess.
I have asked the creator of the PDF files what fonts have been used to generate PDF and the answer was Arial MT. I have installed Arial MT True Type in my system but it did not help. I assume that I have to use Type1 fonts. However, I am not sure where I can get Arial MT Type 1 font and if the font is commercial or not.
I have also found a hint that Adobe Type Manager does not work with Win7, 64 bit, which is my system. But I am fully unsure if it has something to do with my issue.
Linda, I very much doubt that any variant of Arial contain
s cyrillic characters. And, as far as I know, there is no way to copy and paste from a PDF to another format text in fonts that are not in your system (sorry, I had overlooked this part).
I have researched some hours on this issue. This is a very old bug in Adobe Reader that has not been corrected for years.
The problem is that every font is internally "organazied" in table where latin characters are at the first places and any other characters follow below latin. See this table as example http://www.azfonts.de/images/fonts/A/R/arialmt/table.gif
Adobe Reader picks the right letters when displaying the PDF. However, when copying text into the buffer it looks for the standard encoding of the system. In my case it is German, so Adobe Reader ignores the encoding in the PDF, converts text in latin characters and thus messes up the cyrillic text.
I hope that anybody from Adobe Reader can confirm this bug and correct it in the next version.