1 Reply Latest reply on Dec 29, 2011 8:43 AM by Dave Merchant

    PDF conversion

    chennaiaras

      Hi all,

       

      I am using adobe reader 10.0.1 in Windows Xp.

       

      I am using XPDF 3.02pls 'pdftotext.exe -enc utf-8 myfile.pdf' to convert a Tamil language pdf file to text.

       

      I am getting the text file but with some of the characters not shown and some are broken.

       

      Will anyone help on this issue on how to convert a non-english PDF into a txt file with all of its characters retained.

       

      Thanking you,

      A.Araskumar

        • 1. Re: PDF conversion
          Dave Merchant MVP & Adobe Community Professional

          Extracting plain text from a PDF file is a complex task, and it's not uncommon for a PDF file to have incomplete lookup tables (so the glyphs on screen don't have a Unicode representation). This results in errors and omissions in the exported text, and there's not a whole lot anyone can do about it other than re-creating the PDF file properly from the original source material. Adobe Acrobat may do a better job of the conversion, but there are no guarantees.

           

           

          Please note that these forums are for discussion of Adobe products and related topics; we do not provide support for non-Adobe software.