I am using adobe reader 10.0.1 in Windows Xp.
I am using XPDF 3.02pls 'pdftotext.exe -enc utf-8 myfile.pdf' to convert a Tamil language pdf file to text.
I am getting the text file but with some of the characters not shown and some are broken.
Will anyone help on this issue on how to convert a non-english PDF into a txt file with all of its characters retained.
Extracting plain text from a PDF file is a complex task, and it's not uncommon for a PDF file to have incomplete lookup tables (so the glyphs on screen don't have a Unicode representation). This results in errors and omissions in the exported text, and there's not a whole lot anyone can do about it other than re-creating the PDF file properly from the original source material. Adobe Acrobat may do a better job of the conversion, but there are no guarantees.
Please note that these forums are for discussion of Adobe products and related topics; we do not provide support for non-Adobe software.