Copy link to clipboard
Copied
Hi,
I am having issue in the Adobe Acrobat while extracting text. If any font that having the encoding identity-h the text could not extract.
While copy the all text in PDF and paste in the notepad it shows like "?" if the fonts encoding is "Identity-H".
Suppose the font encoding is "Custom", "Ansi" or "Type 1". I didn't find any issues while extracting Text.
Your help in this regard is greatly appreciated.
Thanks,
Arun Segar
Copy link to clipboard
Copied
Impossible to say, what causes the problems.
You could try to open the PDF in another app like Illustrator and save again.
I was able to solve a similar problem in the past with this method.
Copy link to clipboard
Copied
Generally, the font you are using is not on your system and Acrobat can not find a compatible font to substitute in the conversion.
Copy link to clipboard
Copied
Hi Guys,
If I copy and paste a text from PDF file some of the text are coming as ? or box character. While checking I that found the font is embedded but its encoding is identity-h.
So only the text is not coming correctly. Is there any solution to extract a text from PDF if the font encoding is identity-h.
Note: I Tried to extract the text using Indesign and Illustrator same thing happened, the text came as ?. Through Indesign I saved the PDF as .ps and save it as PDF again the text came as ?.
If there is any other option tell me and also tell me if it is possible to solve these issue using Adobe SDK.
I tried the steps told in the below site but it is not usefull:
http://www.yawah.com/en/support/knowledgebase/fsi-pages/kb-0307-identity-h-fonts
Please tell us ur suggestion.
Thanks,
Arun Segar
Copy link to clipboard
Copied
A lengthy work-around is to save the file as a JPeg (or TIFF) and then open the graphics files as a new PDF. Then run OCR to get a common font type. Then you should be able to copy & paste. Be aware that OCR tends to have errors, depending on the quality of the graphic (need at least 300 dpi typically).
Copy link to clipboard
Copied
This option looks like is working, thanks a lot... It look at the beguining like if is impossible but actually worked! manny manny thanks!
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Copy link to clipboard
Copied
This is relatively common, and is caused when the application creating the PDF fails to correctly embed the Unicode lookup table for the font. Without that lookup table there is no relationship between the visible character on screen and the equivalent character code, so copying and pasting the text will lead to either a series of unknown markers, or a jumble of characters with a 1:1 relationship to the original text.
As a PDF stores the character codes rather than the human-readable text, the fact you can see a letter "A" on the page doesn't mean Acrobat has any idea that it's an "A". The lookup tables make that connection, so if they're missing or corrupted there's no way to recreate the semantic connection unless you can re-fry the file with an original copy of the font.