I've run accross several PDFs that contain problematic fonts -- examples are:
1. Adding tagging to the document adds strange characters around the fonts (i.e. "@" or "É")
2. Copying and pasting the text to word creates strange boxes, question marks, garbled letters.
I've been investigating this problem and it seems to have something to do with a failure to convert characters to Unicode values --- I've noticed some of hte problematic text has custom/identity-h encoding, and doesn't contain a Cmap reference to unicode.
However, I've seen several instances of text that is missing a unicode encoding, but copying it to word seems to work fine?
Is there a custom preflight profile that could proactively identify text that will fail copying + pasting / will produce strange characters when accessibility tagging is added?
Alternatively, is there a check that will at least flag text that is likely to fail in this way?
using acrobat 9 pro
Something to try.
Open a PDF. Open Preflight. Go into the Options drop-down.
Select Create Inventory.
There's a "text cannot be mapped to Unicode" check in Preflight but it won't give you exactly what you want. Acrobat has no way of knowing if the glyph maps are corrupted, as only a human can recognize the character shapes aren't correct. Internally, all Acrobat sees are lists of character code numbers.