6 Replies Latest reply: Apr 10, 2012 8:17 PM by colinodden RSS

    Logging redactions


      Dear You,


      I'm operating on a large corpus of documents where human coders are looking at OCR-ed PDFs. We're trying to facilitate their coding work by highlighting relevant search terms. Redaction actually works pretty well for that, as highlighting with an empty red rectangle draws attention to stuff likely to be of interest.


      Let's say, though, that some documents match 5 words in the document, some match 30, some match whatever. It would help our project immensely to know how many matches there are in a given document. We also want this to happen automagically -- counting by hand is beside the point.


      Logging would be the holy grail ... even if there's a redaction log with a bunch of noise in it, we're willing to parse the log to get what we want. Yet, we can't find any evidence that Acrobat (Windows or Mac, v9, but we're willing to shell out for X if it gives us this functionality) logs much of anything it does to a document.


      Many thanks, and in advance.


      Colin Odden

      Ohio State University

        • 1. Re: Logging redactions
          George_Johnson ACP/MVPs

          Are you saying that you're using the Search & Redact feature? It's possible with a script to count how many redaction annotations are present. It's also possible with a script to search through a document for a word and automatically add text highlights, and then count how many there are.

          • 2. Re: Logging redactions
            colinodden Community Member

            Hi George,


            Thank you. True, I'm using Search & Redact (I'm batch processing OCR, then Search & Redact). I'd use highlighting if I knew how to highlight via script.


            I'm glad to know it's a possibility, but searches for various combinations of highlight + logging, redaction + logging, etc. have come up blank. Any suggestions for where I should look?




            • 3. Re: Logging redactions
              George_Johnson ACP/MVPs

              Is all you're looking for is a count of the number of redaction annotations that were added? I'm assuming that the redactions aren't actually applied by the batch prcoess. Is that correct?

              • 4. Re: Logging redactions
                colinodden Community Member

                I'm writing this to clarify: I'm running two batch processes:


                Process one: OCR originals and save a copy.

                Process two: operating on the documents saved from process one, Search & Redact (marking for redaction) based on one or more search terms.


                Now, if marking for redaction doesn't create a "redaction instance" that gets logged / counted, no problem. We don't mind doing the actual redactions on a copy of the PDFs, since we'll retain the un-redacted copies to actually read.


                That is, we could OCR in one step, search&redact (applying redactions) in another step that creates files we'll just toss because the point was just counting / logging, and then search&redact marking for redaction but not applying so that we have little red rectangles around our search hits.


                (thanks again)

                • 5. Re: Logging redactions
                  George_Johnson ACP/MVPs

                  Here's a simple script that will count the number of redaction annotations in a document and show the total in an alert popup.


                  annots = getAnnots();
                  var sum = 0;
                  if (annots) {
                      for (var i = 0; i < annots.length; i += 1) {
                          if (annots[i].type === "Redact") {
                              sum += 1;
                  app.alert("Total redaction annotations: " + sum);



                  This just demonstrates that you can determine the number of redaction annotations with a script. You can adapt it to suit your needs. For example, you could use it in a batch process and write the number for each file to the JavaScript console by changing the last line to:


                  console.println(documentFileName + ": " + sum);


                  When you open the console (Ctrl+J) after the batch process, it will show a line for each file that shows the file name and the number of redaction annotations.

                  • 6. Re: Logging redactions
                    colinodden Community Member



                    This is extremely helpful. Thank you. Now I'm off to figure out how to reliably write to files rather than to the console.