Was this PDF created from a saned document?
Are the problem letters special glyphs of the letters "f" or "th" in a special font?
The problem is due to ligatures, which are single glyphs that represent two (sometimes three) otherwise separate characters. Common examples include the following: fl fi th ff st ft
If the font data in the PDF doesn't include the information to map the ligatures to the individual characters, there's not a lot you can do.
Thanks George. I wasn't even aware that such critters existed. But I went to a word that was giving me trouble and attempted to delete the first two characters, and with the push of one delete key, both characters disappeared. I went throughout all the document and wherever I had the same problem I deleted the "ligature" and replaced it with the actual characters and that solved the problem.
Interesting approach. I hope you don't have to do this a lot as that could get rather tedious!
In addition to all the extra work this causes, keep in mind that a ligature usually takes up less space than the individual characters combined. This means that the formatting of your document may change.
Karl Heinz Kremer
Replacing the ligature with the real letters was done as a kind of proof-of-concept. I will now report to the project manager what is happening and let him decide what he wants to do, but if the ligatures are changed on a regular basis. it will not be done by me...
It's worth pointing out 2 things...
1 ligatures are normal and considered to show superior, pofessional typesetting
2 if a pdf follows recommendations, acrobat knows what ligatures mean, and will automatically replace with multiple characters on copy/paste
However, if you're extracting text it is up to you to apply the info in the pdf and some foreknowledge, and do your own replacement if needed (IF... Because some situations require the presenvation of ligatures)
You can use copy from acrobat to test if the ligatures are done right