Skip navigation
Currently Being Moderated

Using Find All Suspects & "Not Text" Grew File Size About 5x

May 17, 2012 12:16 AM

I'm using Adobe Acrobat X.

 

I was pretty happy with the size of a 216 page PDF that I am working on, approximately 6 MB.

 

The PDF consists of only CCITT G4 encoded black and white pages, scanned from photocopies. I used Recognize Text, which seemed to work pretty well, and increased the file size by an insignificant amount.

 

Then I spent a couple of hours using Find All Suspects. Recognize Text interpreted a lot of stray marks on the original photocopies as text, and I went through and marked all of them as "not text".

 

I didn't realize the file grew in size to 22.9 MB, until I tried to email it to myself.

 

What do you suppose happened? Did I miss a step?

 
Replies
  • Currently Being Moderated
    May 17, 2012 9:30 AM   in reply to NY2LA

    Do you have the Pro version, if so, what happens if you do a Save As > Optimized PDF and bring up the PDF Optimizer dialog and click the Audit Space usage...? What PDF Output Style did you select when running the Recognize Text command?

     
    |
    Mark as:
  • Currently Being Moderated
    May 17, 2012 10:14 AM   in reply to NY2LA

    When you run the Recoginize Text command, there is a Settings section and if you click the Edit .. button you can select Searchable image, Searchable image (Exact), or ClearScan. Here is a blog titled Better PDF OCR. ClearScan is smaller, looks better that helps to explain the differences.

     
    |
    Mark as:
  • Currently Being Moderated
    May 17, 2012 10:49 AM   in reply to NY2LA

    There seems to be a tremendous amount of Document Overhead in your file. You might also make a copy of the file and try tinkering with some of the Optimizer settings to get rid of some of the bloat.

     
    |
    Mark as:
  • Currently Being Moderated
    May 18, 2012 3:47 AM   in reply to NY2LA

    Have you tried just doing a simple Save As > Reduced File Size?

     
    |
    Mark as:
  • Currently Being Moderated
    May 23, 2012 10:06 PM   in reply to NY2LA

    We tried marking suspects as "Not Text" in 3-4 files but the file size increase was marginal and that too can be owed to the font information that gets embedded while correcting suspects.

     

    Would it be possible for you to share the file on which you are encountering this issue for our investigation?

     
    |
    Mark as:
  • Currently Being Moderated
    May 24, 2012 2:21 AM   in reply to NY2LA

    You can attach the file your reply. There would be an options "Insert Image"

    Please use it to attach your file.

     
    |
    Mark as:
  • Currently Being Moderated
    May 24, 2012 7:52 AM   in reply to apangasa

    @apangasa,

     

    Just tried the "insert image" for a PDF (< 2MB).

    No go.

    My understanding is that one's profile must have a higher permissions level then the generic end-user has in order to post up a non-image file.

     

    Be well...

     
    |
    Mark as:
  • Currently Being Moderated
    May 25, 2012 5:34 AM   in reply to NY2LA

    How about throwing it up on Adobe Sendnow (free) and then posting the link?

     
    |
    Mark as:
  • Currently Being Moderated
    May 28, 2012 1:06 AM   in reply to NY2LA

    Thanks for sharing the file with us. We were able to reproduce the issue with the shared file.

    We are investigating this issue and we will surely keep you updated.

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 20, 2012 11:55 PM   in reply to NY2LA

    Hi,

     

    On incrementally saving the file, file size grows and this is in accordance with how feature is designed.

    I tried saving the file again with a different name and this time, I could not reproduce the issue. Could you please confirm if you are also seeing the same behavior for non-incremental file save.

     

    Thanks

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 26, 2012 12:23 AM   in reply to NY2LA

    "Incremental save" means an ordinary Save, not Save as.

    By design, this always makes the file bigger, sometimes much bigger.

    It is designed to be very quick, so it never deletes anything. And if you make a change you have both the old and the new.

     

    Save As, ordinary Save as without optimization or PDF/A or anything, removes the deleted stuff without touching the quality. I think it is this vital step you have mised. Nothing to do with scanning or OCR specifically.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points