• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

OCR really necesary ?

New Here ,
Jan 05, 2010 Jan 05, 2010

Copy link to clipboard

Copied

Newbie here need help on my first ebook creation.

I have 70mb of Jpeg scan of a book and just started doing Ocr on it and its a drag to begin with,

manual correction is needed on almost every page, but probably the worst thing is that you need

to check up on the original for every time you do a page because the pages arent in order with the

contents(where certain topics are).

The average filesize is 400-500kb(195 pics in total), i scanned two pages in one pic,

is there any file size reduction tech that can get me on track?

Views

1.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 24, 2010 Jan 24, 2010

Copy link to clipboard

Copied

As scanned text content is generally US letter page size or EU A4 page size the resulting image (that is parked in PDF) is, essentially, just another "graphic".
Even if "two pages" are processed to give a "one page" output - still have a large virtual page size.
If your audience is using eBook readers this poses a problem as graphics/images do not reflow to fit the smaller screen. Even more of an issue with mobile devices.
Depending on what the particular eBook reader can or cannot do with a PDF would dictate the usefulness of OCR.


If you audience is using a computer the the page size is not an issue. However, without OCR of the image's text, one cannot do a "Find" or "Search".
Not being able to use Find or (using an embedded index or Catalog index) use Search means when the user wants to go to something in particular in the PDF they cannot.
This becomes an annoyance quickly.


Size reduction (image compression).
For images you have CCIT, ZIP, JPEG, JPEG 2000, etc.
Problem is that the only way to *seriously* reduce the size of an image/graphic is by destructive removal of pixels.
For an image of text this is a body blow -- can make the picture of the text unusable quickly.
Any photo editing application (even MS Paint) lets you down sample (compression)
Acrobat Standard/Pro provide a variety of ways to tweak and/or compress.


Be well...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 14, 2010 Jun 14, 2010

Copy link to clipboard

Copied

Hi,

I tried the given solution for one of the eBook i have created earlier and it was working fine...i recommend going for the recommended solution.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 17, 2010 Jun 17, 2010

Copy link to clipboard

Copied

LATEST

The best way to get on track is to rescan. Jpeg files are meant to be used for things like photographs. For use with ocr, then tiff is much better. While TIFF is will frequently get larger files, if you are using OCR and replacing the tiffs image with characters, then you can get a much smaller file. Personally, I'd recommend something like Omnipage to convert the scanned TIFF files to rtf format to reprocess the files as word documents or place the files rtf files into InDesign or FrameMaker, and use the programs perhaps with Acrobat to create the final e-book. More recent versions of Acrobat do better at OCR, but I do not think they are best for a large OCR project whose size needs to be small and whose final result needs to be accurate.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines