3 Replies Latest reply on Jan 19, 2017 1:44 PM by jennifers534176

    How to programmatically identify pdfs containing black boxes or black objects (attempting to identify incorrect attempts at redacting text)

    jennifers534176

      We have identified a few pdfs where users thought they were redacting text but they were not.  They were doing things like "highlighting" text with the color black in other applications like Microsoft Word and then converting the document to a pdf.  When viewed in Adobe Acrobat, the text appears redacted because you see a black box instead of text but you soon discover that you can easily copy/paste the black box area into notepad and see the supposed redacted text or you can edit the object and just select the black box, then delete the black box and underneath is now revealed the supposed redacted text.

       

      We know how to fix the pdfs to truly make the text redacted and we know the steps to give to the users to correctly make text redacted in the future.  What I am researching now is a way to quickly identify all pdfs affected by this issue (instead of the more tedious route of opening each pdf up and testing the blacked out areas).   Is it possible, based on how pdfs are coded in regards to objects of color black on a document, to programmatically identify suspect pdfs?