Skip navigation
Currently Being Moderated

Acrobat 9 crashes on OCR

Oct 6, 2008 9:25 PM

I've been trying to convert a batch of large PDF files to PDF searchable files by using the OCR of Acrobat. In the middle of a batch, a large (1000+ page) document crashes acrobat. I have narrowed it down to this image:

http://img90.imageshack.us/img90/2418/badke2.png
59,520 bytes

When I convert it to PDF (File->Create PDF->From Single File) and then use Acrobat to "Document->OCR Text Recognize->Recognize Text using OCR", Acrobat always crashes.

Is this true for anyone else that could try it?

It kills my batch processing and is making this large conversion quite painful. Is there a way around it?
 
Replies
  • Currently Being Moderated
    Oct 7, 2008 12:39 AM   in reply to (John_L._Douglas)
    The basic problem is that the png file is a 172 X 172 resolution. The minimum is 200dpi. I tried it in AA5 and it warned of an invalid resolution. I switched it to 2X the resolution. At that the OCR ran, but the only conversion was on the page number. I have no idea why AA9 would have crashed, but it is likely related to the page resolution.
     
    |
    Mark as:
  • Currently Being Moderated
    Oct 13, 2008 9:43 AM   in reply to (John_L._Douglas)
    Not sure this is right place for this question, but...
    Is there a "one-step" way to highlight all the occurences of a particular word in an OCR'd pdf? I currently find the 1st occurrence and highlight it, then find the next occurrence and highlight it, then find the next occurrence and highlight it, and so on...but surely there is an automated way to do this....
     
    |
    Mark as:
  • Currently Being Moderated
    Oct 13, 2008 2:18 PM   in reply to (Mark_Smartt)
    Mark,

    This thread relates to crashes with OCR. Your post does not. Please
    start your own thread. Don't hijack unrelated threads.

    Mike
     
    |
    Mark as:
  • Currently Being Moderated
    Oct 14, 2008 1:21 PM   in reply to (John_L._Douglas)
    I have problem to download first file that caused crash :
    http://img90.imageshack.us/img90/2418/badke2.png

    Could you, please send it to me as attachment to osatchou@adobe.com.

    Thanks,
    Olga
     
    |
    Mark as:
  • Currently Being Moderated
    Oct 16, 2008 9:38 AM   in reply to (John_L._Douglas)
    I was able to reproduce crash in both cases. Fix will be available in the next dot release of Acrobat - 9.1.
    Thanks for your help!

    Olga Satchouk,
    Acrobat QE
     
    |
    Mark as:
  • Currently Being Moderated
    Oct 20, 2008 3:05 PM   in reply to (John_L._Douglas)
    Yes, it looks like the same problem on all pages caused crash in Acrobat 9.0. All pages came out just fine after OCR with 9.1.

    Thanks again for your help.
     
    |
    Mark as:
  • Currently Being Moderated
    Oct 23, 2008 7:09 AM   in reply to (John_L._Douglas)
    when is the new adobe 9.1 coming out then?
     
    |
    Mark as:
  • Currently Being Moderated
    Nov 6, 2008 6:52 AM   in reply to (John_L._Douglas)
    Does anyone have information on the release of 9.1 or the fix to this issue?

    Thanks
     
    |
    Mark as:
  • Currently Being Moderated
    Nov 6, 2008 12:08 PM   in reply to (John_L._Douglas)
    My boss and I do document digitizing and we are having the same problem. Not every file crashes, only random larger files. I was wondering if anyone else had any insight on fixing this.

    Thanks.
     
    |
    Mark as:
  • Currently Being Moderated
    Nov 7, 2008 8:50 AM   in reply to (John_L._Douglas)
    I am having the same problem with 40 + pages to ocr.
     
    |
    Mark as:
  • Currently Being Moderated
    Dec 2, 2008 7:10 AM   in reply to (John_L._Douglas)
    Does anyone know when the 9.1 version will be available? I'm having these same frustrating OCR problems!
     
    |
    Mark as:
  • Currently Being Moderated
    Dec 3, 2008 8:08 AM   in reply to (John_L._Douglas)
    Yes; I have the same problem with 9.0 Pro:

    When I try to OCR a .pdf document to perform searches, it always crashes with the following error:

    AppName: acrobat.exe AppVer: 9.0.0.332 ModName: ocrlibraryinf.dll

    ModVer: 2.0.0.1 Offset: 000206f1
     
    |
    Mark as:
  • Currently Being Moderated
    Dec 8, 2008 2:47 AM   in reply to (John_L._Douglas)
    Same problem; using Acrobat 9 on three stations and Acrobat 8 on one station. Acrobat 9 has failed to complete a batch OCR on ANY station while Adobe 8 has yet to fail, and it has done several large batches.
    Any help yet? I have 60,000 pages to OCR.
     
    |
    Mark as:
  • Currently Being Moderated
    Dec 18, 2008 12:09 PM   in reply to (John_L._Douglas)
    Same problem here. Just purchased 9.0 for a project and am having this problem with over a thousand documents. ocrlibraryinf.dll.
    When will 9.1 be available?!

    For those of you who only have a few pdf's to ocr, try extracting the culprit pages as tif, then importing them back in. It worked for me, but I cant do that for 1000+ documents.
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 3, 2009 4:25 AM   in reply to (John_L._Douglas)
    We have had the same problem and provided samples to Jason Reuer at Adobe. His e-mail address is jreuer@adobe.com. This is very frustrating. We tried purchasing the extended version of Acrobat Pro 9 and it has the same problem as does Adobe 8.1. Since I was unable to get any response back from Jason Reuer, I tried communicating with Olga Satchouk at Adobe who responded here. Her e-mail address is no longer valid.

    Does anyone have any idea how to get around this problem now, or when the mystery version 9.1 that supposedly corrects it will be released. Acrobat is useless to our organization with this mission critical bug and it is frustrating that Adbobe has not been more responsive.

    Also, does anyone have the e-mail address of someone higher up at Adobe so that we can elevate this issue, or at least let them know of Adobe's failure to repsond to all our requests.
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 4, 2009 10:25 AM   in reply to (John_L._Douglas)
    I also encountered same problem, using AA8.1.3, against 63 very large (600megs+) PDF files that were created with a utility that did not make them searchable.

    I also found that AA8 crashed during running OCR. Huge shortcoming.... AA8 does everything in memory, stores nothing in temporary files, forgets everything it did upon crashing, forces a mammoth time consuming restart from the beginning.

    I finally got through them all by breaking each file into 2 files (a and b), then running OCR 100 pages at a time within each file, with constant attention.

    A random error message, "Cannot find file" kept popping up, stopping, waits for me to click on "OK", adding to huge delays. Gotta constantly watch for this error. Even though I click on "Ignore this error message", it does not listen, keeps popping up and stopping processing, waiting for a dumb "OK" response.

    But at least, as it successfully gets through 100 pages, I can save it, then carry on to next 100 pages. If it crashes, I can restart to the last successful 100 pages completed.

    I conclude that AA8 absolutely hates large files. My computer is a 2.6Ghz DualCore with 3.5 gigs of memory, still not enough.
    Took me an entire week to work my way through these 63 files, which BTW, tied up my computer for that whole time, insufficient memory to do other things.

    That AA8 does not work with temporary files is a huge shortcoming when working with large PDF files. I find it astonishing that upon an unexpected failure, it has no way to remember where it was when the crash occured.

    This situation begs the question...... Is there any other utility out there that will make a PDF file searchable, that is not made searchable when first created by some utility other than Acrobat?

    Regards,

    Terry Smythe
    Winnipeg, Canada
    smythe@shaw.ca
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2009 2:37 AM   in reply to (John_L._Douglas)
    No. The problem is not just big files - although the problems are worse in that case. I tried OCR on a 1600 page TIFF. Acrobat would crash with no indication of where it encountered a problem and all OCR done to that point was lost. I split the document into 100 page sections to identify the source of the crash. 4 of the 100 page sections bombed. I then did OCR on each of the four by splitting them into single pages. I identified four pages of the 1600 that were causing the crash. OCR on those single pages crashed Adobe on multiple machines running every possible setting. I provided the pages to Jason Reuer at Adobe (jreuer@adobe.com) at his request and heard nothing back despite multiple e-mails. So much for customer service. I am able to perform OCR on these pages using OmniPage Pro 1.5 with no problem, so it is not the pages. I also provided Adobe with pages from other documents that produce crashes. Again, I heard no response. In this forum, Adobe claims that the problems are fixed in Acrobat 9.1, but they have not responded to any of our requests about how to get Acrobat 9.1. If anyone knows, please tell the rest of us.

    Adobe batch handing capability compounds the problem. Instead of loading, performing OCR and saving one document at a time, it tries loading them all at once into memory. I am trying to perform OCR on 100,000 small documents, so that method is a disaster. One crash and everything is lost. As everyone has noted - a crash is certain, so Acrobat is basically useless in its current state for OCR. This has been true for Acrobat 7, 8 and 9.

    This is all vetry frustrating.
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2009 8:23 AM   in reply to (John_L._Douglas)
    On 5 Jan 2009 at 2:37, Daniel E. Smith wrote:

    > I am able to perform OCR on these pages using OmniPage Pro
    > 1.5 with no problem, so it is not the pages.

    Agreed. In every case, where AA8 crashed, I was able to have AA8 run OCR against the offending page as if nothing was wrong, then carry on. Very mysterious, and extremely aggravating.

    But the big question..... Is there another utility out there that will run OCR in such a way that the PDF file becomes searchable thereafter?

    I have no trouble running OCR from any number of OCR packages, and all work just fine, but the OCR results are always external to the PDF file. The PDF file remains non-searchable even after running it with ABBYY, OmniPagePro, TextBridge, ScanSoft, etc.

    So far, AA7 or better is the only utility I have found that when OCR is run, it leaves behind a PDF file that is searchable.

    This is important in the case of a very large set of very large PDF files initially created by some utility other than Acrobat. 100% of these PDF files are not searchable.

    In my case, some 50,000+ pages of a historical newspaper, 1881 to 1943, were scanned into TIFF format by some automated process, likely using an ADF. Then the TIFF files were converted by some automated utility into PDF files, all non-searchable.

    I want to concatenate the TIFF files by year, then convert these yearly files into yearly PDF files. But such a process leaves them all non-searchable.

    I've basically done this by using AA8, but the process was incredibly time consuming and aggravating, requiring constant attention for all these dumb repetitive errors that keep popping up, ignoring my earlier selection to ignore all errors. GGrrrrr..................... Urge to kill........... :-)

    Regards,

    Terry
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 5, 2009 10:09 AM   in reply to (John_L._Douglas)
    Olga Satchouk says:

    > I was able to reproduce crash in both cases.
    > Fix will be available in the next dot release
    > of Acrobat - 9.1.

    How about the same fix for AA8.1.3, for those of us volunteers who can't afford the high cost of version 9? We don't have a company budget to fall back on, even though the work we are doing clearly benefits society as a whole.

    Regards,

    Terry Smythe
    Winnipeg, Canada
    smythe@shaw.ca
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 25, 2009 1:04 PM   in reply to (John_L._Douglas)
    Rather than breaking the PDF into smaller files, run OCR until it crashes, note the page, then run OCR up to the page before the crash. Then run OCR again starting the page AFTER the crash.

    At least you'll have the PDF doc in one piece, even though certain pages in the doc won't have been OCR'd.

    Then you can insert re-scanned pages into the appropriate spot.

    This was a workaround that worked for me.
     
    |
    Mark as:
  • Currently Being Moderated
    Jan 25, 2009 8:54 PM   in reply to (John_L._Douglas)
    Thank you for your suggestion. I did try that, but discovered that AA8 would sometimes crash on an earlier page, forcing restart again at the beginning. I concluded that as AA8 does everything in memory, that these 600meg PDF files were simply too big. And even on a swift 3.0Ghz dual-core system with 3.5 gigs of memory, it still took a huge amount of time.

    Curiously, If I took note of the offending page where it crashed, AA8 would OCR process the 10 pages embracing the offending page, quite normally, if I sent it to process just those pages.

    As a consequence, rather than trust it to crash at same spot, I elected to break the files in half, then OCR process 100 pages at a time, saving the file at conclusion of each group. It might still crash occasionally, but at least I did not have to repeat what had already been done successfully.

    Knitting the broken files together after successful OCR processing is really quite trivial, done in seconds, not a hardship.

    But how nice it would be if the fix applied to Version 9 would also be applied to version 8.1.3. As a volunteer, I can't afford version 9, and my version 8.1.3 otherwise does the job, albeit with aggravation.

    When the next big group of similar PDF files emerge, I won't waste so much time experimenting. I'll just repeat this process from the beginning.

    Regards, and thank you for thinking of me, appreciated.

    Terry Smythe
     
    |
    Mark as:
  • Currently Being Moderated
    Feb 27, 2009 7:45 AM   in reply to (John_L._Douglas)
    I'm also having problems with Acrobat crashing randomly when OCRing large scanned documents (>100MB). These problems began with Acrobat 8 and are still there in AA9!

    What kind of fix do you mean Adobe has applied to AA9? Since installing the boxed version, Adobe's Update application never found any updates for AA9 :-(
     
    |
    Mark as:
  • Currently Being Moderated
    Mar 3, 2009 8:06 AM   in reply to (John_L._Douglas)
    So I guess the latest stable version for doing OCR is Acrobat 7, isn't it?
     
    |
    Mark as:
  • Currently Being Moderated
    Mar 3, 2009 2:15 PM   in reply to (John_L._Douglas)
    Version 3 was awesome, simple and stable. Came with an office scanner.
    Version 5.05 was great, everything in one package, before Acrobat split standard and pro.
    Version 6.X was a nightmare with bloated PDFs and snails performance. Luckily I just had to help others who had it.
    Version 7.1 Pro was/is easy and stable.
    Version 8.x I have no personal experience with.
    Version 9 Pro Extended trial has some really nice features like one-step watermark removal, but have not bought the upgrade yet. Its good to hear other people's trials and tribulations, even though you would hardly ever have positive posts, thanking Adobe for a great product.
     
    |
    Mark as:
  • Currently Being Moderated
    Mar 11, 2009 1:02 AM   in reply to (John_L._Douglas)
    Adobe released the Acrobat 9.1 update yesterday. Unfortunately, the online updater doesn't find it so you have to download it manually from Adobe's website.

    As far as I can tell the OCR engine is much more stable now! No more crashes so far...
     
    |
    Mark as:
  • Currently Being Moderated
    May 4, 2011 2:05 PM   in reply to (John_L._Douglas)

    I am having this problem also, running ver. 9.4

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 17, 2011 2:36 PM   in reply to (John_L._Douglas)

    I've got 9.4, and I too am having the crash to desktop problem when trying to OCR the pages, and usually with amounts of 50 or more pages, and randomly.  Very frustrating due to the amount of time involved in scanning, waiting for the program to OCR, and then seeing that dreaded error screen, with no apparent explanation. Not every time though, praise the Lord. So far I have not seen anyone answer this problem decisively in this thread.  Does anyone know a cecisive answer? Thanks, Bill W.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 17, 2011 4:44 PM   in reply to wwood44299

    I have not seen the problem with AA 9.4. What you might try is OCR on a portion of the document. Save that result and then try the next part of the document. The key may be so many pages. It used to be that Acrobat had a 50 page limit, but that is gone. Still it might be a size issue. You could also try clearing your TEMP folder and see if that helps.

     

    With all such experiments, do work on a copy.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 18, 2011 7:16 AM   in reply to Bill@VT

    Thanks very much Bill. I’ll give these ideas a try. Bill W

     

     

     

    Law Offices of William N. Woodson, III, APC

     

    1717 Hillside Drive

     

    Fallbrook, CA 92028

     

    (760) 535-6645

     

    FAX (760) 451-1777

     

    wwood44299@aol.com

     

    wnwoodson3@gmail.com

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 6, 2011 9:12 AM   in reply to (John_L._Douglas)

    MacBook Pro (1997)

    - Mac OS X 10.7.2

    - 2.6GHz Core 2 Duo

    - 4GB RAM

     

     

    Acrobat 9 Pro

    - version 9.4.6

     

     

    Acrobat 9 Pro OCR always crashes when using ClearScan but not when using "Searchable Image" or "Searchable Image (Exact)." I scanned several journal pages at 300 dpi (color, grayscale, bitmap) in .tiff and .png as well as screen selecting text from a browser. The results were consistent across all variations.

     

     

    The last time I used the Acrobat's OCR function was last Summer before upgrading from Snow Leopard to Lion. Under Snow Leopard, Acrobat did not crash during OCR (it did crash, just not while processing text for OCR). I did not attempt Acrobat OCR under Lion 10.7 or 10.7.1.

     

     

    Repeatable test.

     

     

    1. Open wikipedia "Crash (Computing)" page

     

     

    http://en.wikipedia.org/wiki/Crash_(computing)

     

     

    2. Enlarge text size, if desired.

     

     

    I tried several text sizes from the default to much, much larger. Text size has no impact on the results.

     

     

    3. Create a PDF

     

     

    File >> Create PDF >> From Selection Capture

     

     

    I selected the first paragraph:

     

     

    A crash (or system crash) in computing is a condition where a computer or a program, either an application or part of the operating system, ceases to function properly, often exiting after encountering errors. Often the offending program may appear to freeze or hang until a crash reporting service documents details of the crash. If the program is a critical part of the operating system kernel, the entire computer may crash. This is different from a hang or freeze where the application or OS continues to run without obvious response to input.

     

     

    4. OCR

     

     

    Document >> OCR Text Recognition >> Recognize Text Using OCR

     

     

    4.1 Searchable Image (Exact)

     

     

    Primary OCR Language: English (US)

    PDF Output Style: Searchable Image (Exact)

    Downsample: None

     

     

    Result: No crash — OCR successful

     

     

    4.2 Searchable Image (tested for each downsample option)

     

     

    Primary OCR Language: English (US)

    PDF Output Style: Searchable Image

    - Downsample: Lowest (600 dpi)

    - Downsample: Low (300 dpi)

    - Downsample: Medium (150 dpi)

    - Downsample: High (72 dpi)

     

     

    Result: No crash — OCR successful

     

     

    4.3 ClearScan (tested for each downsample option)

     

     

    Primary OCR Language: English (US)

    PDF Output Style: ClearScan

    - Downsample: Lowest (600 dpi)

    - Downsample: Low (300 dpi)

    - Downsample: Medium (150 dpi)

    - Downsample: High (72 dpi)

     

     

    Result: Crash — OCR not successful

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 30, 2011 11:44 AM   in reply to kelly brant

    I'm having the same problem. ClearScan crashes, but "searchable image" does not. Happily, I can export text from "searchable image" format.

     

    MacBook Pro (2008) 2.4 GHz Core 2 Duo (surely Kelly means 2007, not 1997)

    4 GB Ram

    OSX 10.7.2 (Lion)

     

    Acrobat version 9.4.6

     
    |
    Mark as:
  • Currently Being Moderated
    Dec 31, 2011 7:48 AM   in reply to failed_spirit

    failed_spirit wrote: "surely Kelly means 2007, not 1997"

     

    You are correct. 2007.

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 16, 2012 5:40 AM   in reply to (John_L._Douglas)

    I'm having Acrobat 9.5.0.270 crashes on some pdfs, when using batch processing, fast web view feature, with OCR and PDF optimizer, monochrome images were set to JBig2 lossless.

     

    Faulting application name: Acrobat.exe, version: 9.5.0.270, time stamp: 0x4f03f71d

    Faulting module name: OCRLibraryInf.dll, version: 9.5.0.270, time stamp: 0x4f03e982

    Exception code: 0xc0000094

    Fault offset: 0x00093aeb

    Faulting process id: 0x14fc

    Faulting application start time: 0x01cd02e462799896

    Faulting application path: C:\Program Files (x86)\Adobe\Acrobat 9.0\Acrobat\Acrobat.exe

    Faulting module path: C:\Program Files (x86)\Adobe\Acrobat 9.0\Acrobat\plug_ins\PaperCapture\OCRLibraryInf.dll

    Report Id: 0dbea415-6f62-11e1-a8e9-005056ba0009

     
    |
    Mark as:
  • Currently Being Moderated
    37. ,
    Mar 17, 2012 10:52 PM   in reply to kelly brant

    I'm having exactly the same problem, with an identical history of use, to Kelly on post 33.

    When will Adobe fix this?

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)