I am indexing 34000 + documents physically located on the Hard Drive
Windows Server 2008 SP2
Thanks to advice in another thread I started I am indexing the folders one at a time followed by an update after each. Some of the PDFs can be huge (130mb) but the average is closer to 1 mb. On occasion I will get to a PDF that is corrupt (If I copy it to my desktop and attempt to open it, Acrobat Pro says it is corrupt).
I have attempted using cfpdf to read header info in a cftry block with the catch creating a log entry. That should work but it hangs trying to read the doc (assuming that is what is happening with Solr too). I get no log entry and it will continue to hang until timeout for the request.
Can anyone think of a way to break out of a hung file and continue to index the remaining files?
If you are running the version of CF 9 that you have mentioned in your post previously (126.96.36.199028) then you are going to need to patch your server!
I'm thinking the issues you are having are related to a bug (Described not so well here - Bug#3040314 - Bug 80390:I have some corrupt PDFs ). It was fixed in build 188.8.131.523374.
Unless you really have a good reason, I would recommend updating you CF9 instance to at least the latest build of 9.0.2 anyway. If you are having trouble finding the downloads, Gavin Pickin maintains some here - http://www.gpickin.com/cfrepo/
We are in the process of ordering CF11 ENt. Will see if i can apply the patch in the meantime (Nothing moves quite as slow as the speed of Government).
Will mark this as answered as soon as I know.