Hi all! I have a problem and need expertise and confirmation from other users on this one on how to handle.
BACKGROUND: I have hundreds of legal documents which have been filed electronically thru a vendor system. I am getting PDF copies of these documents to post to the website. I started running these thru our normal accessibility procedures and Adobe's accessibility checker iis showing a large percent were scanned to PDF (no OCR). This requires my web developers run OCR, the fix all the suspects, add tags to documents, check reading order and then recheck and fix hundreds of other issues. These legal documents run anywhere from 10 pages up to hundreds of pages.
THE PROBLEM: Having my staff make these PDFs compliant is turning into a time nightmare. It is taking an hour to fix 1 average size scanned PDF. In my mind, PDFs from scanned documents are the absolute worst as far as accessibility. We've read Ohio State University's online tutorial about creating accessible PDFs from scanned documents.
So I think my staff is "stuck" with spending hours upon hours making these scanned documents accessible in order to put them online. Am I correct in my thinking?
How are others out there handling the amount of time it takes to make some PDF documents accessible?
Who in your organization is charged with making sure PDF documents are accessible?
Do you have staff that do nothing but that?
Any insight on your best practices would be appreciated.
Sorry but your suspicions are correct - you can automate the OCR step, but adding structure to the end result has to be done by a human.
Acrobat does have some ability to guess at the purpose of page content based on position (e.g. when exporting a scanned PDF to PowerPoint it can take a decent stab at which elements are part of the slide master and which are regular content) but it cannot possibly understand things like text flow and cross-references. For documents of any importance I wouldn't trust automated tools to be accurate even if they did exist, and checking the tags takes just as long as adding them.
Thanks for the reply, Dave. We are taking the "bite the bullet" approach....