Copy link to clipboard
Copied
Hi Team,
It is possible to get the all Non-Unicode characters from pdf text?, If possible which method should i use.
Thanks,
Maruthu
Copy link to clipboard
Copied
Please define “non Unicode character” precisely as it is not a PDF concept.
Copy link to clipboard
Copied
Export the PDF as Plain Text, I believe this will convert all characters to UTF-8.
Detecting non-unicode characters is a different thing. You can't do that with JavaScript because JS converts everything into Unicode.
Have you looked at the Preflight "Browse Internal Structure" tool? This shows you details of the fonts.
Here is a better tool that allows you to select text and then it shows the properties.
Windjack Solutions, Inc. - PDf CanOpener
You can do this pragmatically with the C++ Plug-in SDK. Is this an option?