This content has been marked as final. Show 3 replies
Probably a problem with the PDF. Maybe ti is a fancy ligature in the
Thanks for helping, Aandi,
yes, you're right, I've tested it out by trying to select one single character only, which didn't work out. So it's a ligature indeed.
I've found other errors coming from that document (e.g. "setting" becoming "settng" when exported).
But yet, doesn't pdf store the original text in addition to ligatures? Particularly exporting to text based clipboard format (without font information) should use the original characters, IMHO.
>But yet, doesn't pdf store the original text in addition to ligatures?
No. There's no capability to do that, and even if there was, no way to
force the PDF creator to use it.
What a PDF creator generally sees is a reference to a single character
in a font. In good cases, the character in the font uses a standard
like like "fi", "fl" or "ffi". This stuff gets stored into the PDF.
And there is a mapping into the Unicode ligatures. Acrobat is
actually fairly smart on text extraction: if you are extracting text
with a known ligature, and the operating system doesn't support it
(like using "fi" on Windows) it does generate a pair of characters.
However, "ti" is not a standard ligature, and doesn't exist in Unicode
so far as I've seen. This is fancy typography, but a problem.