I have 250 scanned technical publications that are larger than 500 MB, for a total of 290 GB! They are so large, because they contain color maps and other foldouts scanned at 300 dpi.
Obviously, they need to be compressed, but I am constrained by the fact that they must be PDF/A-2 compliant. As I understand it, that would require JPEG2000 compression. (I need lossless compression.) PDF Optimizer in Acrobat 9 will work, but it is so slow that it could take me months to downsize the files individually. Batch Sequences in Acrobat 9 doesn’t allow for JPEG2000 compression.
Is there a way that I can batch-compress these files, using Acrobat 9 Pro? If not, is there another program that can do the job?
Thanks!
Thanks for your response. I aplogize for not getting back to you sooner.
My assumption that lossy compression couldn’t be used in PDF/A files was based on a literal reading of a statement in the standard on PDF/A-1 (ISO 19005-1): “Writers of conforming files should not use lossy compression, subsampling, downsampling, or any other process that either alters the content or degrades the quality of source data in the conforming file.” Perhaps I applied that too literally to my situation (an archive of 78,000 scanned documents). PDF/A-2 was the main focus of the 4th International PDF Conference. Several speakers stressed the benefits of JPEG2000 compression to libraries and archives. This started me wondering whether I could minimize storage space by retrospectively optimizing the documents in my archive.
Do you have any advice?
Hmmm. I'm not an expert on PDF, but I see what you quote as a subordinate to the earlier (in PDF/A-2) " Organizations that need to ensure that a conforming file is an accurate representation of original source material might need to impose additional requirements on the processes that generate the conforming file beyond those imposed by this part of ISO 19005."
Optimizing can be worthwhile; only you can decide (it seems to me) if lossy processes are acceptable. Lossy processes aren't limited to compression. For instance Acrobat's scanned document optimizer can perform more or less aggressive background removal. Converting to pure monochrome might be lossy, but once done, JBIG2 compression can be extremely effective.
Some more thoughts on JPEG2000.
1. It is no more lossless than JPEG in general. There is a lossless JPEG, but nobody ever uses that (not available in Acrobat, nor even PDF at all I think). There is also a lossless JPEG2000. Be sure to select "lossless" as the compression option as otherwise it won't be (if that's what you want).
2. Neither JPEG nor JPEG2000 is - to me - an obvious choice for scanned material.
3. If you aren't offered a choice of JPEG2000, but expect it, make sure you are using PDF 1.5 or later, and PDF/A-2 (since PDF/A-1 forbids PDF 1.5).
I would never consider a one way process that simply "optimized" the PDFs in any way and used only the new copies. I would always preserve the originals in case questions ever arise. Disk is cheap, even if bandwidth is expensive.
Thanks for the good advice. Your advice about preserving the originals is particularly helpful. A small subset of my documents are too large for our search engine to serve up, and I have been considering "cutting them down to size." I may do this, for the sake of our search engine, but I will preserve the original files as well.
North America
Europe, Middle East and Africa
Asia Pacific