I have read in one of the documents that the PDF contents can be extracted using an accessibility plug-in in the library AcrobatAccess.lib. I have searched for this libarary and could not find that. I have the following queries...
1. In one of the posts I read that we need to contact the dev center for the library, Is it licensed, if the purpose of usage is for other than screen readers.
2. Is it possible to access each and every bit of information on the PDF.
3. I need to convert PDF to epub, is there any plug-in available for such conversion.
4. Where can I get the SDK along with the AcrobatAccess.lib for an application development for PDF information extraction.
I don’t know who told you specifically about the Accessibility plugin…
But yes, you can write your own plugin to Acrobat (in C/C++) that can extract the contents of a PDF by iterating over all the objects. You will need a copy of Adobe Acrobat (NOT READER!) and the Acrobat SDK to do this.
I am reading a document which explains about "Reading PDF files through the DOM Interface". I am pasting the paragraph from this chapter below. This is from the document named "Reading PDF through Accessibility Interfaces"
"Acrobat 6.0 and higher defines a document object model(DOM) that provides more complete access to the document structure than the MSAA interface. The Accessibility plug-in defines and exports five COM interfaces in AcrobatAccess.lib that exposes Acrobat's document hierarchy"
1. Please comment on the above.
2. I have one more query, Is this the same DOM you are proposing to use in C/C++ to extract the content?
Thanks for the response. Please elaborate more on the method for PDF content extraction. Also please share the related documents. The purpose of this extraction is to convert the PDF file to epub format.