10 Replies Latest reply on Nov 15, 2011 5:54 AM by Luke Jennings

    Weird characters pasted after copying text from PDF file

    usermac Level 1

      I get weird characters pasted after copying text from PDF file. For instance, a plain English sentence or word becomes something like:

      VGHOGHH[WRVDSDUWLUGFLyQESUDFWRVIHQyOLFWRULYDHVXDRORGXF2EWHQ

       

      How to fix it?

      Thanks.

        • 1. Re: Weird characters pasted after copying text from PDF file
          Larry G. Schneider Adobe Community Professional & MVP

          If you go to File>Properties and go to the Fonts tab, what is the Identity listed for the fonts?

          • 3. Re: Weird characters pasted after copying text from PDF file
            Dave Merchant MVP & Adobe Community Professional

            It's a  "problem" that often happens accidentally, but is also used intentionally to prevent copying and indexing of PDF files, especially when posted online.

             

            Fonts in PDF files are stored with two tables, one contains the glyphs (the character shapes) and one contains a "toUnicode" map, which says what character each glyph represents. Acrobat uses the first table to draw the page, so it doesn't actually know what the text "says", only which patterns of shapes to draw. When you copy or search the file, the second lookup table is used to work out what the text says (i.e. in the word APPLE the first table says the second shape looks like "P" even if the shapes aren't stored in alphabetical order, the toUnicode table says the second letter is 0x0050, a capital P).

             

            If this toUnicode map is corrupted or missing, the PDF will render to screen (and print) just fine, but Acrobat has no idea what the shapes mean. The result when you screenread, export, search or copy/paste is a default set of mappings - so it will be a 1:1 relationship (every "A" will become the same character) - but the pairing is not predictable, so it cannot automatically be repaired. You can do it using plugins but would have to manually work out what each pair should be, and recreate the map table a letter at a time.

             

            When this happens intentionally, it means the document author has removed or re-written the toUnicode map, using a plugin. When it happens accidentially it usually means the software exporting the PDF didn't pass the correct font information to the PDF print driver (in the PostScript stream).

            • 4. Re: Weird characters pasted after copying text from PDF file
              usermac Level 1

              Thanks. Yet, in this case I can see the words OK and search for them OK. The only problem is when copy/pasting.

              • 5. Re: Weird characters pasted after copying text from PDF file
                Larry G. Schneider Adobe Community Professional & MVP

                The reason I asked is that every embedded subset is listed as encoding: Custom which sounds like what Dave said above.

                • 6. Re: Weird characters pasted after copying text from PDF file
                  Dave Merchant MVP & Adobe Community Professional

                  As I said, the font mapping is corrupted. Your file contains 122 fonts, but not all are broken. Some text copies OK, some does not - and the text which does not copy cannot be searched. Try searching for "extractos" and it will ignore the title on page 1 (using a corrupted font) but will find it on page 9 (using an intact font).

                   

                  @Larry - "custom" encoding is perfectly OK, provided the mapping table is present.

                  • 7. Re: Weird characters pasted after copying text from PDF file
                    usermac Level 1

                    Thanks. Is there any application to fix such problem? I mean, to repair the file with a click or so. Thanks.

                    • 9. Re: Weird characters pasted after copying text from PDF file
                      GLOC1002 Level 1

                      I have a similar issue but other people can copy and paste from the files but I can't. Is this the same issue?

                       

                      I have a problem which I am not sure where it sits, so I am posting here and a couple of Apple support forums as I think it might be deeper than just Reader/Pro as it also means I can't copy and paste text from Sente (reference library software).

                       

                      I do a fair amount of research and download plenty of research papers.  One such paper (Diving and Hyperbaric Medicine - DHM) from the South Pacific Underwater Medicine Society (SPUM) is causing me a major problem with copying and pasting text from it; no-one else who I have spoken to has the same problem.

                       

                       

                      Quote:

                       

                                                       
                      The further development of medical support for professional diving
                      David Elliott
                                                                                                                                      
                                                                                                                                            
                                                                                                                                            
                                                                                                                                           
                                                                                                                                              
                      and when I copy and paste the same from Preview, I get

                       

                      Quote:

                       

                      The further development of medical support for professional diving David Elliott

                       

                      I have had a look and content copy and page extraction are both allowed so security isn't an issue :(.

                       

                      Acrobat Distiller 8.1.0 (Windows) was used to create the file and is a PDF Version 1.4 (Adobe 5.x)

                       

                      I am on Lion 10.7.2 if that makes a difference. I have had a look at the fonts table and they are either 'Custom' (Type 1) or 'Identity-H' (Type 1 CID).

                       

                      There are around 40 of these files which are produced elsewhere I can't ask for them to be reproduced but others don't have the same issues as me. Any ideas?

                       

                      Thanks very much for any help you can give me

                       

                      Regards

                       

                      Gareth

                      • 10. Re: Weird characters pasted after copying text from PDF file
                        Luke Jennings Adobe Community Professional

                        I know this was from 5 days ago, but this might be helpful.

                        http://forums.adobe.com/message/3938668#3938668