8 Replies Latest reply on May 11, 2016 9:54 PM by G-Whizz

    Problems copy & paste text from PDF to InDesign CS5

    MangaGal Level 1

      Hey guys

       

      Today, I tried copying and pasting text from a PDF I received from my client, into a InDesign CS5 doc, and the text came out was just random numbers and punctuations.

       

      I've tried exporting Word / RTF & TXT file from the PDF and yet they still come out random. I'm not sure if it's just how the PDF was made, or there is something wrong with my CS5.

       

      I'm on Snow Leopard, CS5 Premium, Acrobat Pro 9 & also Acrobat Reader X.

       

      Any ideas why?

        • 1. Re: Problems copy & paste text from PDF to InDesign CS5
          [Jongware] Most Valuable Participant

          It's the PDF.

           

          Usually, you can copy text out of a PDF, but there is no guarantee *at all*. Older software used to take shortcuts to subset fonts: first character it encountered got coded as #1, the next as #2, etc. etc. If you do a search & replace of each of the nonsense characters, you'll slowly see the original text appearing (thst's not as simple as it sounds, though).

           

          A similar "problem" is that sometimes spaces don't get copied (there is no need for a "Space Character" in a PDF), and that you cannot copy contiguous lines of text as a single paragraph. All of it is because a PDF is not intended nor designed to be re-used after creation.

           

          If you really need this text and don't want to type it in, try to get hold of the original file.

          1 person found this helpful
          • 2. Re: Problems copy & paste text from PDF to InDesign CS5
            Peter Spier Most Valuable Participant (Moderator)

            I dont know if it would work, but you might also try saving out the PDF as TIFF from Acrobat, then use Acrobat's OCR to recover the text from the TIFF.

            • 3. Re: Problems copy & paste text from PDF to InDesign CS5
              BobLevine MVP & Adobe Community Professional

              Or better yet, try saving as Word. Acrobat X does a very good job.

               

              Bob

              • 4. Re: Problems copy & paste text from PDF to InDesign CS5
                Steve Werner Adobe Community Professional & MVP

                Bob is correct about Acrobat X doing a very good job of saving as Word files. It's much improved over earlier versions of Acrobat.

                • 5. Re: Problems copy & paste text from PDF to InDesign CS5
                  MangaGal Level 1

                  Oh my god, internet has been down. So frustrating.

                   

                  But the good news is my client was kind enough to send me the original file after I told her what happened.

                   

                  I've got exporting Word from Acrobat X. No luck.

                   

                  I think it may be the "subset font" issue because there's only subset font embeded.

                   

                  Thanks for all your help.

                  • 6. Re: Problems copy & paste text from PDF to InDesign CS5
                    edge30

                    Is there anybody from Adobe looking at these discussions that can help?

                    I found this issue happening very frequently (since the last automatic update of reader) regardless the application (InDesign is not needed).

                    In fact you can just coy from PDF and past into notepad and the issue happens.

                    I work with PDF docs a lot and I see the issue intermitently, sometimes even using the same PDF documents the issue happens while after rebooting it is ok.

                    As I said, I work with PDF a lot so exporting to to other formats is too much wasting of time/resources in a production environment.

                    Any help to FIX the problem would be appreciated.

                    Thanks

                    e.

                    • 7. Re: Problems copy & paste text from PDF to InDesign CS5
                      BobLevine MVP & Adobe Community Professional

                      As already pointed out, this is something you may have to live with. PDF is an end product and any use other than that is a bonus.

                       

                      Bob

                      • 8. Re: Problems copy & paste text from PDF to InDesign CS5
                        G-Whizz

                        Hi there,

                         

                        After experiencing this issue myself over the years, with text in PDF files containing hard end-of-line returns at the end of every single line which is just damn annoying, I decided to again search for a solution when a client gave me a series of 300-600 page books in PDF format that they wanted to publish again. Hopefully since 2011 you have got yourself a workaround, but through a lot of research online it still appears this is an issue for many, and there's no perfect solution.

                         

                        I found that saving the PDF as plain text and copying and pasting in to InDesign in theory, worked well until you realize that manual hyphens are all replaced with discretionary hyphens, which is not helpful. Other quirks are that it places some hard-end-of-line returns in place of only some of the original discretionary hyphens, and so this just creates more issues.

                         

                        Here's my solution after a good deal of trial and error, after all, I have dozens of these lengthy files to convert from PDF to InDesign. This process is 95% perfect in my opinion, creating very little in the way of tidying up:

                         

                        1. Open original PDF. First check to see if it is a "Tagged" PDF.
                        2. Go Control D or Command D, look at Description Tab, look at "Tagged PDF" at bottom left. It will be yes or no, and worth noting.
                        3. Next. If it has headers, footers or page numbers, you need to get rid of these.
                        4. Use Crop tool in Acrobat to crop all pages to same size, removing unnecessary details.
                        5. File > Save as > More Options > Post Script > close the PDF.
                        6. Click on new Post Script file and should automatically open in Acrobat Distiller, which will automatically after a few seconds, re-save the Post Script as a new PDF. Once that's complete, close Distiller.
                        7. Open new PDF file and give it a new name so you don'r confuse it with any others.
                        8. Now if your original PDF was not a "Tagged PDF", then you need to Tag it now. If it was Tagged, then ignore this Number 9, and go straight to number 10.
                        9. Go Tools > Accessibility > Add Tags to Document. Just ignore any Tagging reports that may show up in the left-hand pane.
                        10. Save under name if you wish. The important thing is this file MUST be closed and re-opened again after Tags are inserted.
                        11. Now comes the best bit. Select all text (Control A / Command A), and Paste into your InDesign file, flowing all text as it comes in. Use InDesign's Autoflow to add pages automatically to the end of the Story.

                         

                        All your text should appear as normal. You will find that all hard end-of-line returns have disappeared, and that all manual hyphens remain in place as they should be. There are two issues with this method:

                         

                        1. InDesign will add an additional Return character (Paragraphs break) where, in the original text, pages end and begin, in other words, where text runs from the bottom of one page to the start of the next. It's a case then of manually deleting these. In a 600 page document this can be irksome, but for short documents, not a problem. The results far outweigh the usual conflicts.
                        2. You will lose ALL original manually entered breaks between paragraphs, i..e gaps of one line or more between paragraphs. So for lengthy documents, this can be irksome. It took me an hour today to manually re-insert these breaks into a 400 page document and I can live with that any day of the week.

                         

                        Good luck - Graham.