1 person found this helpful
It's the PDF.
Usually, you can copy text out of a PDF, but there is no guarantee *at all*. Older software used to take shortcuts to subset fonts: first character it encountered got coded as #1, the next as #2, etc. etc. If you do a search & replace of each of the nonsense characters, you'll slowly see the original text appearing (thst's not as simple as it sounds, though).
A similar "problem" is that sometimes spaces don't get copied (there is no need for a "Space Character" in a PDF), and that you cannot copy contiguous lines of text as a single paragraph. All of it is because a PDF is not intended nor designed to be re-used after creation.
If you really need this text and don't want to type it in, try to get hold of the original file.
I dont know if it would work, but you might also try saving out the PDF as TIFF from Acrobat, then use Acrobat's OCR to recover the text from the TIFF.
Or better yet, try saving as Word. Acrobat X does a very good job.
Bob is correct about Acrobat X doing a very good job of saving as Word files. It's much improved over earlier versions of Acrobat.
Oh my god, internet has been down. So frustrating.
But the good news is my client was kind enough to send me the original file after I told her what happened.
I've got exporting Word from Acrobat X. No luck.
I think it may be the "subset font" issue because there's only subset font embeded.
Thanks for all your help.
Is there anybody from Adobe looking at these discussions that can help?
I found this issue happening very frequently (since the last automatic update of reader) regardless the application (InDesign is not needed).
In fact you can just coy from PDF and past into notepad and the issue happens.
I work with PDF docs a lot and I see the issue intermitently, sometimes even using the same PDF documents the issue happens while after rebooting it is ok.
As I said, I work with PDF a lot so exporting to to other formats is too much wasting of time/resources in a production environment.
Any help to FIX the problem would be appreciated.
As already pointed out, this is something you may have to live with. PDF is an end product and any use other than that is a bonus.
After experiencing this issue myself over the years, with text in PDF files containing hard end-of-line returns at the end of every single line which is just damn annoying, I decided to again search for a solution when a client gave me a series of 300-600 page books in PDF format that they wanted to publish again. Hopefully since 2011 you have got yourself a workaround, but through a lot of research online it still appears this is an issue for many, and there's no perfect solution.
I found that saving the PDF as plain text and copying and pasting in to InDesign in theory, worked well until you realize that manual hyphens are all replaced with discretionary hyphens, which is not helpful. Other quirks are that it places some hard-end-of-line returns in place of only some of the original discretionary hyphens, and so this just creates more issues.
Here's my solution after a good deal of trial and error, after all, I have dozens of these lengthy files to convert from PDF to InDesign. This process is 95% perfect in my opinion, creating very little in the way of tidying up:
- Open original PDF. First check to see if it is a "Tagged" PDF.
- Go Control D or Command D, look at Description Tab, look at "Tagged PDF" at bottom left. It will be yes or no, and worth noting.
- Next. If it has headers, footers or page numbers, you need to get rid of these.
- Use Crop tool in Acrobat to crop all pages to same size, removing unnecessary details.
- File > Save as > More Options > Post Script > close the PDF.
- Click on new Post Script file and should automatically open in Acrobat Distiller, which will automatically after a few seconds, re-save the Post Script as a new PDF. Once that's complete, close Distiller.
- Open new PDF file and give it a new name so you don'r confuse it with any others.
- Now if your original PDF was not a "Tagged PDF", then you need to Tag it now. If it was Tagged, then ignore this Number 9, and go straight to number 10.
- Go Tools > Accessibility > Add Tags to Document. Just ignore any Tagging reports that may show up in the left-hand pane.
- Save under name if you wish. The important thing is this file MUST be closed and re-opened again after Tags are inserted.
- Now comes the best bit. Select all text (Control A / Command A), and Paste into your InDesign file, flowing all text as it comes in. Use InDesign's Autoflow to add pages automatically to the end of the Story.
All your text should appear as normal. You will find that all hard end-of-line returns have disappeared, and that all manual hyphens remain in place as they should be. There are two issues with this method:
- InDesign will add an additional Return character (Paragraphs break) where, in the original text, pages end and begin, in other words, where text runs from the bottom of one page to the start of the next. It's a case then of manually deleting these. In a 600 page document this can be irksome, but for short documents, not a problem. The results far outweigh the usual conflicts.
- You will lose ALL original manually entered breaks between paragraphs, i..e gaps of one line or more between paragraphs. So for lengthy documents, this can be irksome. It took me an hour today to manually re-insert these breaks into a 400 page document and I can live with that any day of the week.
Good luck - Graham.