Something to try -
Using Acrobat Pro, rebuild the associated Catalog index file (*.pdx).
Thanks for the suggestion. I tried it and it had no effect. There was no prior index catalog. After running the build on every file in the directory, I still receive the same results, no successful searches for a portion of the files and very successful searches for the remainder. And to restate the change appeared after the 10th of January. It would be lovely if Adobe or someone had the tools to analyze the files and identify the reason for the different search results.
Perhaps you could open a free adobe.com account and upload a couple of pre-Jan 10 PDFs & post-Jan 10 PDFs.
Publish > Share > Copy (to clipboard) the files.
Post the link(s) here.
Regardless, to be indexed a PDF will have to have font types (renderable or 'hidden' (OCR output)) that can be mapped to Unicode (to be searchable).
With Acrobat Pro there are Preflights available to assess syntax issues, 'searchable' content, font types and so forth.
Preflights can yield a wealth of information.
I have uploaded 4 documents as suggested, 2 dated on or before January 10th and two afterwards. The doc URLs are as follows;
I look forward to knowing what the differences are between the two sets. The method of creation has not changed. They have been generated weekly on a Windows XP Pro machine via Adobe 6 Standard. Please honor the stipulation in the actual documents about copying and disclosure. These files will only be accessible for a brief period of time to ascertain the reasons for the failure of the text search function. Thanks in advance for any assistance.
First -In context of your post above this one it might be a good thing to slick those files sooner rather than later.
While present in the account they are available to the "world" through the links.
The two pre-Jan 10 PDFs have content mastered in a word processing application.
I'd say it was MS Word as the PDFs were produced via printing with Adobe Distiller 9 (installed by Acrobat 9 Standard or Pro).
A File > Print > Adobe Printer selected.
The Distiller job option used had PDF version 1.5 (associated with release of Acrobat 6) selected; so, that's the version of the two PDFs.
These PDFs both have renderable text rather than hidden text associated with OCR output.
Both PDFs, as outputs of Adobe's 'Acrobat' product are ISO 32000-1 (ISO's PDF standard) compliant.
Consequently, Acrobat's Catalog index / Search feature encounters a "well-formed" (Standards compliant) PDF.
The post-Jan10 PDFs were also sourced from a word processing application (again, I'd say MS Word).
However, the free PrimoPDF application developed the PDF output.
The PDFs contain renderable and hidden text. It is as if the initial output PDF was and image and that was sent through an OCR process that replaced recognized characters with a renderable character and left unrecognized characters as hidden text.
Consequence: What Catalog index / Search / Find can 'grab' is not the same as what you 'see'.
In sum, post Jan 10 the way the PDFs were produced changed. That is the crux of it.
Thank you for your analysis. I will research in further detail how the PDF output may have been altered. The only change that I am aware of is the introduction of the Worldox document management system during this same period of time. It is possible that it substituted a different rendering module for this activity. Your response is most helpful. I appreciate your promptness and diligence. The files have been "slicked".
Just a followup.
One piece, of many, that makes a file "PDF" is parking the 'producer' agent in the PDF.
The post 10Jan PDFs have PrimoPDF as 'producer'.
So, it'd appear that the Worldox product has some arrangement with Nitro to use PrimoPDF as the Worldox PDF producer agent.
Unfortunately output PDF from the lower end (of price) PDF producers are typically 'off' somewhat in context of ISO 32000-1 (the PDF standard) compliance.
Same with many 'high end' report generator processes.
Consequence is that some / many features of a 'well-formed' PDF are missing and this becomes evident when one puts the PDF(s) 'under load', as it were.
A close look 'under the hood' at the Worldox/PrimoPDF configuration might identify if better quality PDF output is possible.
One more piece of follow up on this issue. I went to the machine that generates the weekly PDF files and performed the text searches as I had at my own workstation. The searches failed similarly on the PDF files after January 10. We killed the Worldox memory resident application and perfomed a new save of one of the files and it was generated via the Adobe PDF printer. This file was searchable. I believe this confirms that there is something about the save routine under Worldox that is redirecting or substituting PDF generators while the Worldox application is active. I am waiting for Worldox input. They are generally excellent and on point.
I have a similar problem with Adobe Reader X. I'm unable to find/search for text in a pdf document from a 3rd party vendor which is dated December 2010. Previously I could search in the document with my previous version of Adobe Reader, I think 9. Since I don't have control over the document I'm trying to search, I uninstalled X and installed 9.4.0. I can search with 9.4.0 just fine - the word "Replay" that I searched for has 68 instances in the document. I can't send the document since it would violate our NDA with the company, but I've used pdf's from the company for 5 years without any search problems with earlier versions of Reader.
Thanks for your suggestion. I removed the Adobe X reader and install 9.4.2 and it experiences the same problem on the files in question. They are generated in a manner that prevents the text from being read. I still do not believe this is an Adobe problem but attributable to the pdf generator that Worldox uses. They claim that their process does not alter the normal sequence but as my test above demonstrates, there is a direct correlation between their app being in memory and the introduction of this problem. I will continue to research and will post a solution as soon as I find one.
I am having problem with text search in Adobe reader latest version 9.3.
One pdf file generated through a pdf printer inside a word processing program and has around 100 pages.
When I try to search the text inside the pdf file using Adobe Reader 9.0, it is working alright and it is recognising the text searched for/
However, When the same file is searched using Adobe Reader 9.3 etc it is not searcvhing the text which is present at more than five places in the document.