Apology for incomplete information, I am using Adobe Acrobat 9.0 Pro.
And the way I am exporting it to xml is "File->Export->XML" or "File->SaveAs->xml"
Well, our pdfs are converted using some free java library, it a word document which has header & footer, and then it is converted into pdf using that java library.
So when I export that pdf to xml from adobe acrobat pro, I don't see header and footer value in the xml, rest all looks fine.
Actually those are the Termsheet PDF,and they are confidential so I cannot share.
So there is no way to get data from artifact(if at all it is identifying as artifact)?
My purpose is to extract that data from pdf and validate against some expected data.
I tried couple of tools online which coverts pdf to word. but I dont find it worth comparing those converted word doc
And xml seems reliable.
I am trying to export pdf to xml using Adobe Acrobat Professional.
I can export the data pretty nicely, but it is not exporting the headers/Footers from the PDF.
Is there a way to extract headers/footers of the pdf document?
Another way you can do that is to use EDI Link Connect. It can export data from a PDF (headers & footers included). The XML will be structured properly so you can immediately import into the program of your choice. Its intended for Business documents like Orders,Invoices,Shipping,Reports etc. I'm not sure if thats the type of PDFs you're looking for but if it is, that might be something to look at. Here is a link with more info:
Converting PDFs to XML with EDI Link:http://ecdynamics.com/pdf-conversion.php