I receive pdf files created with Indesign CS3 probably on Mac OSX. I then copy some of the text and paste it in a Java program for processing. When the text uses some fonts, ligatures like fi and fl turn up as squares. The fonts that do this is for example SohoGothicPro-Bold which is in OpenType format. The Java program says that the characters is unicode 30 for fi and 31 for fl (Using .codePoint() in Java). But should be fb01 or fb02. This would be easy to translate if it was the same error in all fonts but in other fonts 30 is fl and 31 is fi.
Does anyone have any idea how to solve this? I have tried to substitute the font with ghostscript in linux but without positive result. I would prefer a solution with an automated process.
The OS where I do the copying is Windows XP and Fedora 10. And I have the font installed in the OS.