1 Reply Latest reply on Mar 4, 2008 1:50 AM by (Aandi_Inston)

    text extraction from a pdf

      Hello,

      I am trying to convert pdf documents in plain text but the output is disappointing.

      The document was generated using acrobat pdf printer from Microsoft Word.
      Opened the resulting pdf in Acrobat Reader and did a "save as text" on it.

      The resulting text is broken, letters are missing or doubled. Is there some catch to it?

      I cannot understand why Acrobat cannot interpret its own files.

      Best regards,
      Vlad
        • 1. Re: text extraction from a pdf
          (Aandi_Inston) Level 1
          Acrobat can only work with what is present in the file. For instance,
          in some cases there is just a scan, a picture, and no text can be
          extracted.

          Sometimes letters are doubled up when the document's creator used
          "fake bold", where letters are printed twice to make an illusion of
          bold text.

          Aandi Inston