-
1. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
Test Screen Name Feb 10, 2014 12:53 AM (in response to baccah)Acrobat does not use Unicode to show the characters in a PDF, so this will always work. Unicode is a layer added during text extraction, and sometimes this will fail, either because there is no mapping defined in the PDF, or because the characters have no Unicode range.
-
2. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
baccah Feb 10, 2014 1:32 AM (in response to Test Screen Name)Thanks for your reply, you told me that the acrobat do not use unicode to show the text,is this mean that the acrobat ignore the embeded fonts during the show text process?, if this is true so what is the use of the fonts inside the pdf. thanks in advance ,,,,,,,,,,,
-
3. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
Test Screen Name Feb 10, 2014 1:43 AM (in response to baccah)No, Acrobat uses the embedded fonts. You can clearly see this on screen by using unusual fonts.
Unicode is not used, and not needed, to show embedded fonts on screen. Either Encoding or CMap are used to directly find characters in the font. The PDF specification has full details.
.
-
4. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
baccah Feb 10, 2014 2:05 AM (in response to Test Screen Name)Ok, now character has no unicode inside pdf or inside font(as shown in image), how can I get this character outside the PDF?
-
5. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
Test Screen Name Feb 10, 2014 2:23 AM (in response to baccah)You cannot. Probably. Many PDF files do not permit accurate text extraction.
A simple test is to see if Acrobat can extract text accurately. Acrobat has 20 years of development in this area, so if it cannot get the text, it is probably the case that you cannot either.
-
6. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
baccah Feb 10, 2014 3:10 AM (in response to Test Screen Name)How can I test that Acrobat extract text accurately or not?
-
7. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
Test Screen Name Feb 10, 2014 3:15 AM (in response to baccah)1. Extract text using Acrobat
2. Look at it or analyse it.
Sorry, I know that seems an unhelpful answer but I don't understand what else you can mean.
-
8. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
baccah Feb 10, 2014 3:24 AM (in response to Test Screen Name)Never mind, Is it free operation?
-
9. Re: Some characters extracted from pdf hasn't corresponding unicode in its font
Test Screen Name Feb 10, 2014 3:34 AM (in response to baccah)No, Acrobat is not free, but you can do the same test with the free Adobe Reader, using Copy/Paste to extract text.


