I would like to extract some of the information that is stored at attributes to a pdf file.
I have seen other threads on the same subject and one suggested the use of DDX which, as far as I understand, is some kind of markup language.
The only problem is that I do not know if that option is still available since the DDX homepage seems to have shut down a year and a half ago.
Hence, I am not sure their products are available at the market any more and if so - which one to use (there seemed to be at least seven different products from DDX).
Is there any other solution available which could provide the same result (i.e. an extraction of the data in the attribute fields of the pdf-file) ?
If you're referring to the XMP metadata (subject, author, creation date, etc.) then provided the PDF file isn't totally-encrypted*, it's in plaintext at the end of the file. Just parse the file and look for the start of the XML structure block, which will begin with the tag "<x:xmpmeta".
In a very large file, given you know the string is at the end, it's sensible to read from the end rather than the start.
*If the file is encrypted, metdata can be left in plaintext depending on the choice made by the user on the encryption dialog.
Exactly the answer I needed.
Harald 'Hal' Ekedahl