-
2. Re: Is it a bug of Distiller?
Bill@VT May 11, 2014 6:14 PM (in response to PigPigPig)Are the fonts embedded in the PDF? When you extract text, you generally have to have the fonts on your system. However, if you created the PDF from WORD, why not just go to the WORD document. That is much more effective as editing and extraction issues with PDFs is not normally recommended. Have you tried to save the PDF as a DOCX file and see if that gives you want you want?
-
3. Re: Is it a bug of Distiller?
PigPigPig May 11, 2014 6:54 PM (in response to Bill@VT)I am sure I have the font in my system and the font is embedded in the PDF. I had tried to save the PDF as a DOCX file and it is different from the PDF. Some characters miss. I am a software engineer. I found parts of Arabic PDF files can't be correctly extracted. I tried to debug them and reproduce the bug. I guess its cmap is wrong or is incomplete so that we can't correctly extract its texts. I don't know why Adobe Acrobat generates an incomplete cmap.
-
4. Re: Is it a bug of Distiller?
PigPigPig May 12, 2014 2:35 AM (in response to PigPigPig)Shall I move the question to PDF Language and Specifications?
-
5. Re: Is it a bug of Distiller?
Test Screen Name May 12, 2014 5:06 AM (in response to PigPigPig)I cannot find liga.0758.medi.alt1 (U+10354) in any Unicode chart or discussion. These do not seem to be part of the Unicode standard.
-
-
7. Re: Is it a bug of Distiller?
Test Screen Name May 12, 2014 8:19 AM (in response to PigPigPig)It may not be working because it is not standard Unicode. Perhaps Distiller has no ability or wish to map to private Unicode ranges, because this has little value in text extraction. I believe it uses only standard Unicode characters, so the choice of font for text extraction is not crucial.
-
8. Re: Is it a bug of Distiller?
PigPigPig May 12, 2014 8:54 AM (in response to Test Screen Name)I am not familiar with opentype or truetype font. Maybe what you said is a reason. However standard unicode FEE0 is also a substitution of standard unicode 0644. Why does adobe acrobat choose U+10354 instead of U+FEE0, if Distiller has no ability to map to private Unicode ranges?
-
-
10. Re: Is it a bug of Distiller?
Test Screen Name May 12, 2014 9:40 AM (in response to PigPigPig)It seems to me that, yes, that is a problem. Text extraction is just a question of getting out Unicode values. If fonts use private ranges I don't see that it can work.





