1 Reply Latest reply on Jul 21, 2009 9:49 AM by Bernd Alheit

    Batch process to remove ocr container from pdf?


      Hello All.

      I have several OCRed files and now I need to delete the OCR information.I know that the info is stored as containers into the pdf. There are <snap> containers and <artifact> containers. The <snap> containers contain the ocr information, the <artifact> containers contain the image information.

      I know that some people used the pdf-TIFF-pdf workaround, but I have really large files so this option is very time consuming.

      Please help. There must be a way to do a batch process which uses javascript to remove the <snap> containers of the pdfs and do the ocr afterwards.

      Does anybody have an idea how a script should look like?


      I would be very happy if somebody have a solution or at least knows a possibilty to get the desired result.


      Thanks in advance,