Skip navigation
vv211
Currently Being Moderated

How can I create a

Jan 31, 2013 10:31 AM

Tags: #acrobat #pdf #ocr #convert #vector #cs6 #ocr_not_working

Windows 7 Pro x64

Acrobat XI Pro

 

I have a pdf of a book that's 1,371 pages long, bookmarked and indexed (Chapter 1 Section 1, page vii, etc), and currently is not searchable.

 

I've tried using the "Recognize Text" tool, but I get the dreaded "This page contains renderable text" error on all but the cover page (which is just a low-res jpeg).

I've tried removing headers and footers as well as bates numbering, not that there should have been any of those to begin with, and still got the above error.

I've tried printing as an xps document and converting it back to a pdf but i still got the renderable text error.

 

Last night I converted it to jpeg files (600 ppi, 300 or lower doesn't look good on screen) and combined those into a pdf and used the recognize text tool, however the size of the pdf now is 1.26GB and I of course have lost the bookmarks and all that good stuff. The original is about 18MB.

 

  • What other methods are there to recognize text?
  • Is there any way to convert the text in the images to vector format and possibly save some space? The original pdf uses vector based text and so it just looks so much smoother.
  • What are my options?

 

Thank you in advance for the help,

vv

 

P.S.

To get the bookmarks and page numbering correct on a pdf book does one have to do that by hand? Cause I would think that would be killer with such a large book

 
Replies
  • Currently Being Moderated
    Jan 31, 2013 10:38 AM   in reply to vv211

    If at all possible, someone will make a PDF in an application (such as InDesign or Word) where the creation of links and bookmarks can be automated. However, the pages wouldn't then be scans.

     

    Nobody numbers pages by hand, surely!

     

    You can keep the bookmarks if you are careful, like this.

     

    Let's call your files ORIGINAL.PDF and SEARCHABLE.PDF.

    Open ORIGINAL.PDF. Use the REPLACE PAGES function. Replace with all pages from SEARCHABLE.PDF. Save as a new name such as BETTER.PDF.

     

    As to the file size, that's largely down to the OCR options you choose. What did you use - please list all the options. If you've kept it as 600 dpi images, that will certainly explain the huge size.

     

    By the way, for OCR accuracy, JPEG is poison. Use TIFF.

     
    |
    Mark as:
  • Currently Being Moderated
    Feb 1, 2013 6:46 AM   in reply to vv211

    If you printed to the Adobe PDF printer you should have gotten a PDF directly. If you turn on print to file, then the printer produces a PS file that has to be open in Distiller to complete the PDF creation, and extra step. In any case, glad you found a solution to meet your needs.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points