I'm embarking on a huge project - digitizing my personal library of letters, pamphlets, government documents, etc.
I've figured out how to create OCR-enabled PDF's, which takes care of text, but I'm still not sure how to deal with images.
I guess I'm basically trying to figure out the best way to save an image - an image that I might possible modify in the future. The simplest way is to just scan everything as a PDF file, then take a screen shot of an image if I want to work with it later.
But which would give me the highest quality image - a JPG or a screen shot of a PDF? The other possibility is to save images as TIFFs, but the file size appears to be enormous.
Am I correct in understanding that a TIFF is the format that's best if I want to resize images with minimal distortion? Also, is it possible to save images as JPGs or PDFs with smaller file sizes, then later convert those JPGs or PDFs to TIFFs?
JPEG is a lossy format - only useful for things you don't want to edit, or only want low quality versions of.
TIFF can be lossless or lossy, depending on the compression options you choose, and is appropriate for image data.
PDF can be lossless or lossy, depending on the compression options you choose, and is appropriate for text or complex documents (mixed test and images).
Scanning automatically to PDF is usually lossy.
Converting a lossy file format to a lossless file format doesn't magically recover the image quality/data you've already lost.
If you plan to work on your images later, forget about JPEGs and screen shots.
PSD and TIFF are the two formats that will allow you to conserve image quality.
Screen shots are unmitigated cr@p when it comes to image quality. Period. That's the worst imaginable way to keep images.
JPEGs are lossy, always. The image deteriorates every single time you save or re-save the image and then close it, even if you do nothing else to it, no exceptions. Even the very first time you close and save a JPEG.
My personal preference is to work with PSDs.
I would not consider the PDF format to keep images, for a variety or reasons.
Chris Cox wrote:
…Converting a lossy file format to a lossless file format doesn't magically recover the image quality/data you've already lost.
I would add: not magically and not otherwise, in any other manner or form.
Wow, I didn't even think of the Photoshop option. I'll have to save a couple images as both TIFF and Photoshop, then compare the file sizes.
Also, suppose I save a really big image - like a full-page photo - as a PSD or TIFF, with an enormous file size. Let's say the the PSD/TIFF measures 5,000 X 5,000 pixels. I could simply open it in Photoshop and resize it to 500 X 500 pixels without losing any image quality, right?
I guess there would have to be limits; if I reduced an image to 5px X 5px, then I'd obviously lose something. But can I safely resize images in the range of 5,000 px to 500 px?
…Also, suppose I save a really big image - like a full-page photo - as a PSD or TIFF, with an enormous file size. Let's say the the PSD/TIFF measures 5,000 X 5,000 pixels. I could simply open it in Photoshop and resize it to 500 X 500 pixels without losing any image quality, right?
You would lose a huge amount of quality. In your example, you'd be downsampling from 25,000,000 pixels to a pathetically tiny 250,000, a MASSIVE LOSS!
OK, it sounds like I'm stuck with huge file sizes. I'll have to do a little more homework. It looks like I can't save images as PSD with Acrobat and presumably not with my scanner. (I'll have to check my scanner when I get home.)
So I'll tentatively save images as TIFF's, then I might be able to compress or zip them. I see a lot of options I have to learn about.
Thanks again for the tips.
digitizing my personal library of letters, pamphlets, government documents, etc.
Do they have to be really that high res? If you are just interested in text legibility... 5K x 5K may be overkill. Scanned docs and JPG are a a terrible choice. (Tons of artifact around the characters). TIFF with LZW compression... or, even though it is not considered a high quality format, maybe GIF? It would make a smaller file size by reducing the number of colors in the doc. In your case, that may be fine. You could scan at 5K x 5K, but say you only use 16 indexed colors (as opposed to 16.77 million) that might be a good compromise. But, like I said, usually you don't hear of GIF being a much used archiving format.
No high res needed for your purpose.
Let your OCR program take care of all the work.
I use Omnipage (but I'm sure other OCR applications can do this as well). With 1 click the application scans the document, recognises the text and creates searchable PDF's (with images)
I've already begun scanning my documents using Adobe Acrobat with OCR capability, and it seems to be working just fine for text. I've been able to create PDF's that I can save as text files.
I just wasn't sure how to handle images. I've now learned that the images on my PDF won't be enough. I'll have to go back and scan them again as TIF files. Now I have to learn about compression and all the other options.