Windows 7 Pro x64
Acrobat XI Pro
I have a pdf of a book that's 1,371 pages long, bookmarked and indexed (Chapter 1 Section 1, page vii, etc), and currently is not searchable.
I've tried using the "Recognize Text" tool, but I get the dreaded "This page contains renderable text" error on all but the cover page (which is just a low-res jpeg).
I've tried removing headers and footers as well as bates numbering, not that there should have been any of those to begin with, and still got the above error.
I've tried printing as an xps document and converting it back to a pdf but i still got the renderable text error.
Last night I converted it to jpeg files (600 ppi, 300 or lower doesn't look good on screen) and combined those into a pdf and used the recognize text tool, however the size of the pdf now is 1.26GB and I of course have lost the bookmarks and all that good stuff. The original is about 18MB.
Thank you in advance for the help,
vv
P.S.
To get the bookmarks and page numbering correct on a pdf book does one have to do that by hand? Cause I would think that would be killer with such a large book
If at all possible, someone will make a PDF in an application (such as InDesign or Word) where the creation of links and bookmarks can be automated. However, the pages wouldn't then be scans.
Nobody numbers pages by hand, surely!
You can keep the bookmarks if you are careful, like this.
Let's call your files ORIGINAL.PDF and SEARCHABLE.PDF.
Open ORIGINAL.PDF. Use the REPLACE PAGES function. Replace with all pages from SEARCHABLE.PDF. Save as a new name such as BETTER.PDF.
As to the file size, that's largely down to the OCR options you choose. What did you use - please list all the options. If you've kept it as 600 dpi images, that will certainly explain the huge size.
By the way, for OCR accuracy, JPEG is poison. Use TIFF.
Thank you I'll try that out when I get home. As to the TIFF v JPEG, I originally started in TIFF but it was almost an hour before it had gotten a quarter way through, and I was a little strapped for time.
I'm a little ocd when it comes to quality so I tend to go for the higher quality stuff
Also, sorry about the lousy thread title, I figured I could change it once I started actually adding content to the question, but apparently not
I tested out the page replacing but the imported pages became non-searchable ![]()
However, after spending another few hours looking around for a program that I could use to export bookmarks and import to the new pdf, I happened upon a comment in a thread where a guy was having the same issue as I was so he printed his non-searchable pdf as a pdf file and he was able to run OCR successfully. I did that and it printed with a .ps extension but I converted that back to pdf and was able to run OCR as well as use jpdfbookmarks to transfer the bookmarks over to the searchable pdf.
The resulting pdf is just over 12.5MB and uses vector-based text, which makes me very happy.
North America
Europe, Middle East and Africa
Asia Pacific