2 Replies Latest reply on Mar 10, 2017 2:23 PM by Robert1776D

    Scripted OCR doesn't let me script finding text, manual OCR does

    Robert1776D

      When I script the OCRing of an image PDF, it creates bounded boxes and can't find text unless the cursor is in that particular bounded box.

      However, if I manually (Enhance Scans > Recognize Text > In this file > Settings > Output = Editable Text and Images, OK) OCR the file, the findtext command works.

      Document is already open when I run this VBA script:

      Set aApp = CreateObject("AcroExch.App")
      Set aAVDoc = aApp.GetActiveDoc()
      Set aPageView = aAVDoc.GetAVPageView()
      Set aPdDoc = aAVDoc.GetPDDoc() pageCount = aPdDoc.GetNumPages 

      ' Get PDF OCR'd
      For curPage = 0 To pageCount - 1
           aPageView.GoTo curPage
           aApp.MenuItemExecute ("TouchUp:EditDocument")
      Next curPage 

      rtgFound = aAVDoc.FindText("accordingly", 0, 0, 1)

      rtgFound is False. If I manually OCR the document and run this code:

      Set aApp = CreateObject("AcroExch.App") 
      Set aAVDoc = aApp.GetActiveDoc()
      Set aPageView = aAVDoc.GetAVPageView()
      Set aPdDoc = aAVDoc.GetPDDoc()

      pageCount = aPdDoc.GetNumPages 
      rtgFound = aAVDoc.FindText("accordingly", 0, 0, 1)

      rtgFound is True. Is it possible to automate Acrobat to OCR into "Editable Text and Images"? That is currently the default UI setting, but it doesn't seem to make a difference.

       

      If I have to search every one of the hundreds of little boxes, what would I have to loop through? Are there other options?

       

      Many thanks!

        • 1. Re: Scripted OCR doesn't let me script finding text, manual OCR does
          Karl Heinz Kremer Adobe Community Professional

          As far as I know, there is no documented (and therefore supported) method to run OCR via the IAC interface. What you are trying to do is relying on a side effect of what you are executing to get the desired result. Chances are that this was never designed to work the way you are hoping it would.

           

          There should not be any difference between running OCR manually and via trying to edit text on a page - at least as long as you are not trying to automate this last step. What is probably happening is that Acrobat has some information cached in the AVDoc that does not get updated when you trigger OCR via the menu item. I would try is to save the document, open it again, and then see if the FindText function works.

          • 2. Re: Scripted OCR doesn't let me script finding text, manual OCR does
            Robert1776D Level 1

            Unhappily saving and re-opening did not do the trick. I inserted this section before the FindText line:

              curDocName = aPdDoc.GetFileName

              aPdDoc.Save PDSaveFull, FilePath & curDocName

              aAVDoc.Close True

              aAVDoc.Open FilePath & curDocName, ""

              Set aAVDoc = aApp.GetActiveDoc()

             

            A manual save and re-open did not work either.

             

            It would be really nice to have a supported method to automate OCR.