I am trying to scan in some old microfilm images into PDF format. I am interested in making the documents searchable, which I can easily do with the tools button, recognize text, and then choosing PDF output style searchable image. The problem is when I type in a word to search in the document it only FINDS less than half of what is on the page. In some cases, it doesn't even find certain words at all. Not only that, but when it converts it into a searchable image, the quality of the image is reduced. The advertisements and text on the page gets more pixelated. What is going on? I appreciate the help.
It's hard to say without seeing the actual, source content, but given what you are describing, I would venture a guess that the source content is of too poor a quality to OCR reliably and accurately. To test this theory, after running OCR, identify a term that you can confirm is present, but that the search feature cannot identify. Then, copy and paste that term into Word or Notepad. What I expect you will note is that a "0" (zero) has been identified as an "8," and other things of that nature. OCR processing is garbage in, garbage out, so the source always has to be considered. Repeat this process across the document to see if experience similar results.
You don't describe how exactly you are transferring the microfilm "images" to PDF, so it's difficult to say if that process is at issue or could be improved. I suspect that's contributing to the problem, though.
Finally, with regard to the degraded image quality, you want to make sure you are selecting "Searchable Image (Exact)" from the OCR dialogue. ClearScan or any type of downsampling will likely cause or contribute to the type of issue you are describing. In a situation with low-quality source material, you would not want to use these other options.
Hope that helps!
PDF Litigation Solutions, LLC