3 Replies Latest reply on Jul 6, 2008 7:21 AM by (Aandi_Inston)

    Reader 9.0.0 - Clipboard exports wrong characters

      Hi,

      I've been trying to copy text from a pdf file into Windows Notepad.

      Apparently, Acrobat Reader 9 sends wrong characters to the clipboard. Here's some text from AR9 (Android_Apps.pdf):

      "Biometric authen5ca5on system
      currently suppor5ng iris based
      authen5ca5on"

      It seems AR9 changes "ti" into "5". How come?

      Axel Dahmen
        • 1. Re: Reader 9.0.0 - Clipboard exports wrong characters
          (Aandi_Inston) Level 1
          Probably a problem with the PDF. Maybe ti is a fancy ligature in the
          font used.

          Aandi Inston
          • 2. Re: Reader 9.0.0 - Clipboard exports wrong characters
            Level 1
            Thanks for helping, Aandi,

            yes, you're right, I've tested it out by trying to select one single character only, which didn't work out. So it's a ligature indeed.

            I've found other errors coming from that document (e.g. "setting" becoming "settng" when exported).

            But yet, doesn't pdf store the original text in addition to ligatures? Particularly exporting to text based clipboard format (without font information) should use the original characters, IMHO.

            TIA,
            Axel Dahmen
            • 3. Re: Reader 9.0.0 - Clipboard exports wrong characters
              (Aandi_Inston) Level 1
              >But yet, doesn't pdf store the original text in addition to ligatures?

              No. There's no capability to do that, and even if there was, no way to
              force the PDF creator to use it.

              What a PDF creator generally sees is a reference to a single character
              in a font. In good cases, the character in the font uses a standard
              like like "fi", "fl" or "ffi". This stuff gets stored into the PDF.
              And there is a mapping into the Unicode ligatures. Acrobat is
              actually fairly smart on text extraction: if you are extracting text
              with a known ligature, and the operating system doesn't support it
              (like using "fi" on Windows) it does generate a pair of characters.

              However, "ti" is not a standard ligature, and doesn't exist in Unicode
              so far as I've seen. This is fancy typography, but a problem.

              Aandi Inston