2 Replies Latest reply on Sep 19, 2013 3:00 PM by Corgano

    Best way to find duplicate pages / find duplicate background images

    Corgano

      What is the best way to detect duplicate pages?

       

      The pages I am dealing with are searchable image (scanned Image background with selectable text overtop). In this case, Any two pages that have the exact same background image will be duplicate.

       

      I only know how to get page text though, so I've been getting the text and hashing it, then checking for duplicate hashes. This works for the most part, but I fear running into two different pages with the exact same text.

       

      What about looking at the background image? If a PDF has multiple pages with the same background image, I assume it would store the image once and then just reference it from the pages? Is it possible to check duplicate pages this way?

       

      Or Does Acrobat have a built-in checking solution I haven't discovered? As always, any help is appreciated