3 Replies Latest reply on Jun 17, 2010 1:18 PM by MichaelKazlow

    OCR really necesary ?

    Yugos

      Newbie here need help on my first ebook creation.

       

      I have 70mb of Jpeg scan of a book and just started doing Ocr on it and its a drag to begin with,

      manual correction is needed on almost every page, but probably the worst thing is that you need

      to check up on the original for every time you do a page because the pages arent in order with the

      contents(where certain topics are).

       

      The average filesize is 400-500kb(195 pics in total), i scanned two pages in one pic,

      is there any file size reduction tech that can get me on track?

        • 1. Re: OCR really necesary ?
          CtDave Level 5

          As scanned text content is generally US letter page size or EU A4 page size the resulting image (that is parked in PDF) is, essentially, just another "graphic".
          Even if "two pages" are processed to give a "one page" output - still have a large virtual page size.
          If your audience is using eBook readers this poses a problem as graphics/images do not reflow to fit the smaller screen. Even more of an issue with mobile devices.
          Depending on what the particular eBook reader can or cannot do with a PDF would dictate the usefulness of OCR.


          If you audience is using a computer the the page size is not an issue. However, without OCR of the image's text, one cannot do a "Find" or "Search".
          Not being able to use Find or (using an embedded index or Catalog index) use Search means when the user wants to go to something in particular in the PDF they cannot.
          This becomes an annoyance quickly.


          Size reduction (image compression).
          For images you have CCIT, ZIP, JPEG, JPEG 2000, etc.
          Problem is that the only way to *seriously* reduce the size of an image/graphic is by destructive removal of pixels.
          For an image of text this is a body blow -- can make the picture of the text unusable quickly.
          Any photo editing application (even MS Paint) lets you down sample (compression)
          Acrobat Standard/Pro provide a variety of ways to tweak and/or compress.


          Be well...

          • 2. Re: OCR really necesary ?
            VITEB

            Hi,

            I tried the given solution for one of the eBook i have created earlier and it was working fine...i recommend going for the recommended solution.

            • 3. Re: OCR really necesary ?
              MichaelKazlow MVP & Adobe Community Professional

              The best way to get on track is to rescan. Jpeg files are meant to be used for things like photographs. For use with ocr, then tiff is much better. While TIFF is will frequently get larger files, if you are using OCR and replacing the tiffs image with characters, then you can get a much smaller file. Personally, I'd recommend something like Omnipage to convert the scanned TIFF files to rtf format to reprocess the files as word documents or place the files rtf files into InDesign or FrameMaker, and use the programs perhaps with Acrobat to create the final e-book. More recent versions of Acrobat do better at OCR, but I do not think they are best for a large OCR project whose size needs to be small and whose final result needs to be accurate.