2 Replies Latest reply on Sep 5, 2011 12:37 PM by ReinhardF

    A very interesting problem with metadata...

    Xcalibur102

      Hello, this issue has us contemplating leaving the IT environment to open a food joint. I hope this is the right place for this question.

      Background:
      We have hundreds of thousands of PDF files created over a 12 year span; all of these files contain no metadata whatsoever and are of the historical kind so OCR is nearly non-existent. The files are named using naming conventions given by historians, so each “collection” has its own naming structure completely different from the others.

      The Task:
      To automatically (thru batch, script or third party software), utilize the existing naming convention of each individual collection and files and populate its own basic metadata fields.

      The problem(s):
      I could create a script that transposes the directory structure into a CVS file, from there, not sure if I can parse it on to an XML file or if is even possible to make an XMP or FDF file. And assuming that it can be done, how do you make a batch that reads from the file containing the directory structure and incorporates it into the PDF file itself.

      Examples
      Collection 1: YYYYMMDD-Pub_Type-Pub_Number
      Collection 2: Pub_Number- Pub_Type-Author-Desc- YYYYMMDD
      From both examples the data can be manually entered into the metadata fields, but since each file is different, it will take forever and a day to accomplish that.

      I am not looking for a cookie cutter solution, I know that the parameters will change from collection to collection, but when you consider that a collection can have over 10k PDF files, a script is the only way to go, and is definitively a lot easier to modify the scripts to fit the collection.

       

      We also contemplated mass murder/suicide but figure it was better to ask for ideas/help… :-)