Using book-scanning services, I have books scanned into pdf files as 300dpi images. Using Acrobat DC Pro, I run the Recognize Text tool within the Enhance Scans category (which actually decreases the file size). I then, within the Redact category, try to Remove Hidden Information or Sanitize Document, and the file size explodes upward. As a representative example, this is what happens:
Step 0: Book as received from scanning service: 19MB
Step 1: Book after running text recognition: 13MB
Step 2: Book after running Remove Hidden Information: 53MB
or: Step 2: Book after running Sanitize Document: 22MB
Incredibly, not only does the file size increase dramatically when removing information/sanitizing document, but, to put salt on the wounds, the image quality also decreases noticeably! This always happens, with no exception.
1) Why does this happen? Intuitively, I would think removing hidden information would actually reduce file size, especially if the image quality deteriorates.
2) Is there any solution to this?
3) Assuming there is no good built-in Acrobat solution, is there any third-party software alternative to sanitize pdf documents that doesn't cause file size to explode up?
This is a known issue and we are working on it. As a workaround, you might try optimizing the pdf file to make it PDF version 1.7 and compatible with Acrobat 10.0 and later.
Thank You for your patience.