-
1. Re: How to determine (in bulk) whether a document has OCR text and what reader versions are supported?
Test Screen Name Jul 23, 2014 11:59 AM (in response to mfhughesjr)Can you be specific about what you mean by "supporting a version of Reader" and how you measure it?
-
2. Re: How to determine (in bulk) whether a document has OCR text and what reader versions are supported?
mfhughesjr Jul 23, 2014 12:24 PM (in response to Test Screen Name)What I meant by supporting a version of Reader is that I don't need the files to be fully backwards compatible all the way back to say the first few versions of Adobe Reader. (There are likely limitations on that much backwards compatibility, anyway.) One of the scanners that was used apparently was set for full backwards compatibility, to the extent possible, for every PDF that it generated. Some of those PDFs are huge, commonly 300-400MBs in size. If I open them in Acrobat 9.0, limit the backwards compatibility to Adobe Reader 8.0 and forward, then resave the file, the size is often significantly reduced.
As to how it is measured, there is something in the PDF itself that indicates a minimum version for Adobe Reader for compatibility purposes. When you select for compatibility in Acrobat during the save process I mentioned, you get to pick the version at which you want to stop -- so if you selected 8.0, it would be compatible with 8.0, 9.0, and so on.
-
3. Re: How to determine (in bulk) whether a document has OCR text and what reader versions are supported?
Test Screen Name Jul 23, 2014 1:35 PM (in response to mfhughesjr)Ok, we're talking about the PDF version. There is probably little reason to use anything older than PDF 1.7 now, as (while no figures are available) most of the world has moved on. It's moved on in other ways too. People may be viewing PDFs in:
- Apple Preview
- Windows Reader
- Chrome
- FireFox
- iPad
- Android
because these all come with their computer, tablet, or browser. None of these use Adobe Reader, but have their own technology. Happily, most are likely to be PDF 1.7 compatible.
Anyway, perhaps the quickest way for a programmer to get PDF versions is to read the first 8 bytes of the PDF file, which will be
%PDF-1.x
and will give the information you need. (Actually, this is an oversimplification, but probably good enough for practical purposes).
-
4. Re: How to determine (in bulk) whether a document has OCR text and what reader versions are supported?
mfhughesjr Jul 23, 2014 1:40 PM (in response to Test Screen Name)Thanks. I'm going to talk to one of my programmers now!


