Sep 19, 2013

    Best way to find duplicate pages / find duplicate background images


      What is the best way to detect duplicate pages?


      The pages I am dealing with are searchable image (scanned Image background with selectable text overtop). In this case, Any two pages that have the exact same background image will be duplicate.


      I only know how to get page text though, so I've been getting the text and hashing it, then checking for duplicate hashes. This works for the most part, but I fear running into two different pages with the exact same text.


      What about looking at the background image? If a PDF has multiple pages with the same background image, I assume it would store the image once and then just reference it from the pages? Is it possible to check duplicate pages this way?


      Or Does Acrobat have a built-in checking solution I haven't discovered? As always, any help is appreciated