Skip navigation
ascript_guy
Currently Being Moderated

Spotlight Meta-Data Importer

May 20, 2009 6:55 AM

For the Macintosh platform, I'd like Adobe to supply or make available a Spotlight Meta-Data Importer that would expose the XMP Meta-Data to Spotlight searches.

 
Replies
  • Currently Being Moderated
    Oct 26, 2010 10:29 AM   in reply to ascript_guy

    Why, after FIVE YEARS (the time since people started asking for one), is there no Spotlight plugin for Indesign files??

     
    |
    Mark as:
  • Currently Being Moderated
    May 23, 2011 3:23 PM   in reply to marcopperman

    Wholeheartedly agree. I've had a few recommend something called PageZephyr -- but it's a hundred to several hundred dollars. I would really like to see Adobe get rolling and get this arguably simple thing added to InDesign; the ability to search for text within .indd files without *having* to keep PDFs of each one around would save me a lot of time (and more than a decent amount of drive space).

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 23, 2011 8:11 PM   in reply to coreworks

    InDesign files have XMP metadata already. Is this really not just a matter of configuration?

     
    |
    Mark as:
  • Currently Being Moderated
    May 23, 2011 8:38 PM   in reply to John Hawkinson

    If the metadata is there, how does one get Spotlight to search within .indd files? If I remember the discussions I've read on the matter over the past half decade, it's up to Adobe to write one, and they haven't as of this writing (insofar as I know, unless it's part of CS5 -- I've not upgraded to CS5 yet).

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 24, 2011 7:49 PM   in reply to coreworks

    OK, so, I looked into this in a bit more detail.

     

    It's up to someone to write, but it need not be Adobe.

     

    Spotlight ships with ~20 importer each of which declares a set of Uniform Type Identifiers (UTIs) upon which it operates. And each plugin importer gets called by Spotlight when a new file of it's UTI type appears, and the plugin exports the metadata to Spotlight's metadata database.

     

    It would appear that none of the standard spotlight plugins simply read the provided file and look for XMP data. Though probably some of them use some dispatch mechanism and then do so (like Image).

     

    There are examples where people solve this problem for movie files by editing the Info.plist associated with the Quicktime mdimporter to add more UTIs. This does not work to add the INDD UTI to Image.mdimporter or Quicktime.mdimporter, but perhaps it's a bit close. It ought not be difficult to merge the sample XMP reading library from the XMP toolkit with the sample Spotlight plugin to make this work. But it is a bit of developer effort.

     

    Also, more annoyingly, InDesign does not define a standard UTI for INDD documents, so unlike a Photoshop file that is com.adobe.photoshop-image, an InDesign file is, on my system, dyn.ah62d4rv4ge80w5xequ. This is circumventable, but annoying.

     

    Blah.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 25, 2011 4:58 AM   in reply to John Hawkinson

    OK, I was wrong.

     

    Or if not technically wrong, fairly misleading.

    A spent a while mucking around and have most of a spotlight importer that reads XMP metadata from arbitrary files using Adobe's XMP toolkit, and can write to the Spotlight database, and it gets invoked at the right times., etc., etc. [It doesn't actually do the writing, but its not much effort to make it do so.]

     

    This addresses ascript_guy's desire from 2009:

     

    For the Macintosh platform, I'd like Adobe to supply or make available a Spotlight Meta-Data Importer that would expose the XMP Meta-Data to Spotlight searches.

    Unfortunately, this is not very useful, because the XMP metadata isn't very useful. As coreworks, points out the real utility comes from being able to search the text content of InDesign documents (though I'm not really sure how Spotlight deals with book-length stuff... But the XMP metadata doesn't have that. It doesn't even have a proxy or summary or short piece of that.

     

    Here's a sampling of the metadata from Blue Square.indd, a sample InDesign layout that ships with the XMP SDK:

    x:xmpmeta xmlns:x=adobe:ns:meta/ x:xmptk=XMP Core 5.1.2>
       <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
          <rdf:Description rdf:about=""
                xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
                xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#">
             <xmpMM:InstanceID>xmp.iid:0E49B4AD072068118C1493EC1DFA0B63</xmpMM:InstanceID>
             <xmpMM:DocumentID>adobe:docid:indd:af62ddc6-213f-11da-ac18-fe020baf4f13</xmpMM:Documen
    tID>
             <xmpMM:OriginalDocumentID>adobe:docid:indd:af62ddc6-213f-11da-ac18-fe020baf4f13</xmpMM
    :OriginalDocumentID>
             <xmpMM:History>
                <rdf:Seq>
                   <rdf:li rdf:parseType="Resource">
                      <stEvt:action>saved</stEvt:action>
                      <stEvt:instanceID>xmp.iid:0D49B4AD072068118C1493EC1DFA0B63</stEvt:instanceID>
                      <stEvt:when>2011-05-25T06:51:10-04:00</stEvt:when>
                      <stEvt:softwareAgent>Adobe InDesign 7.0</stEvt:softwareAgent>
                      <stEvt:changed>/;/metadata</stEvt:changed>
                   </rdf:li>
                   <rdf:li rdf:parseType="Resource">
                      <stEvt:action>saved</stEvt:action>
                      <stEvt:instanceID>xmp.iid:0E49B4AD072068118C1493EC1DFA0B63</stEvt:instanceID>
                      <stEvt:when>2011-05-25T06:51:10-04:00</stEvt:when>
                      <stEvt:softwareAgent>Adobe InDesign 7.0</stEvt:softwareAgent>
                      <stEvt:changed>/metadata</stEvt:changed>
                   </rdf:li>
                </rdf:Seq>
             </xmpMM:History>
          </rdf:Description>
          <rdf:Description rdf:about=""
                xmlns:xmp="http://ns.adobe.com/xap/1.0/"
                xmlns:xmpTPg="http://ns.adobe.com/xap/1.0/t/pg/"
                xmlns:xmpGImg="http://ns.adobe.com/xap/1.0/g/img/">
             <xmp:CreateDate>2005-09-07T14:40:37Z</xmp:CreateDate>
             <xmp:ModifyDate>2011-05-25T06:51:10-04:00</xmp:ModifyDate>
             <xmp:MetadataDate>2011-05-25T06:51:10-04:00</xmp:MetadataDate>
             <xmp:CreatorTool>Adobe InDesign 7.0</xmp:CreatorTool>
             <xmp:PageInfo>
                <rdf:Seq>
                   <rdf:li rdf:parseType="Resource">
                      <xmpTPg:PageNumber>1</xmpTPg:PageNumber>
                      <xmpGImg:format>JPEG</xmpGImg:format>
                      <xmpGImg:width>256</xmpGImg:width>
                      <xmpGImg:height>256</xmpGImg:height>
                      <xmpGImg:image>/9j/4AAQSkZJRgABAgEASABIAAD/7QAsUGhvdG9zaG9wIDMuMAA4QklNA+0AAA
    ...
    </xmpGImg:image>
                   </rdf:li>
                </rdf:Seq>
             </xmp:PageInfo>
          </rdf:Description>
          <rdf:Description rdf:about=""
                xmlns:dc="http://purl.org/dc/elements/1.1/">
             <dc:format>application/x-indesign</dc:format>
             <dc:title>
                <rdf:Alt>
                   <rdf:li xml:lang="x-default">Blue Square Test File - .indd</rdf:li>
                </rdf:Alt>
             </dc:title>
             <dc:description>
                <rdf:Alt>
                   <rdf:li xml:lang="x-default">XMPFiles BlueSquare test file, created in InDesign 
    CS2, saved as .indd and .pdf.</rdf:li>
                </rdf:Alt>
             </dc:description>
             <dc:subject>
                <rdf:Bag>
                   <rdf:li>XMP</rdf:li>
                   <rdf:li>Blue Square</rdf:li>
                   <rdf:li>test file</rdf:li>
                   <rdf:li>InDesign</rdf:li>
                   <rdf:li>.indd</rdf:li>
                </rdf:Bag>
             </dc:subject>
          </rdf:Description>
          <rdf:Description rdf:about=""
                xmlns:xmpTPg="http://ns.adobe.com/xap/1.0/t/pg/"
                xmlns:xmpG="http://ns.adobe.com/xap/1.0/g/"
                xmlns:stFnt="http://ns.adobe.com/xap/1.0/sType/Font#">
             <xmpTPg:Colorants>
                <rdf:Seq>
                   <rdf:li rdf:parseType="Resource">
                      <xmpG:swatchName>Black</xmpG:swatchName>
                      <xmpG:mode>CMYK</xmpG:mode>
                      <xmpG:type>Process</xmpG:type>
                      <xmpG:cyan>0</xmpG:cyan>
                      <xmpG:magenta>0</xmpG:magenta>
                      <xmpG:yellow>0</xmpG:yellow>
                      <xmpG:black>100</xmpG:black>
                   </rdf:li>
             </xmpTPg:Colorants>
             <xmpTPg:Fonts>
                <rdf:Bag>
                   <rdf:li rdf:parseType="Resource">
                      <stFnt:fontName>Times-Roman</stFnt:fontName>
                      <stFnt:fontFamily>Times</stFnt:fontFamily>
                      <stFnt:fontFace>Regular</stFnt:fontFace>
                      <stFnt:fontType>TrueType</stFnt:fontType>
                      <stFnt:versionString>Times-Roman6.0d6e5</stFnt:versionString>
                      <stFnt:composite>false</stFnt:composite>
                      <stFnt:fontFileName>Times.dfont</stFnt:fontFileName>
                   </rdf:li>
                </rdf:Bag>
             </xmpTPg:Fonts>
          </rdf:Description>
       </rdf:RDF>
    </x:xmpmeta>
    

     

    So, if you happen to fill in the Author/Title/Keyword metadata in ID, then maybe that is useful. Or if the thumbnail image of the file is useful.

    Any of it could get stuffed into the spotlight data, but to what end?

     

    Doing any better requires being able to parse the INDD file format and getting text out to give to spotlight. That is not an easy task.

    Markzware specializes in tools that understand the insides of InDesign (and Quark) files, so it's not surprising that they have a product for this (PageZephyr). And really, $100 doesn't seem too much to pay.

     

    So, blah.

     

    I suppose one could take the thumbnail and make a QuickLook generator out of it.

    If someone wants to pull XMP metadata out of some other kind of file that's more useful and would like me to finish the spotlight plugin, you should let me know. Otherwise I don't see bothering.

     

    Oh, I suppose another option is to use the SDK or Scripting API to have InDesign pull out the text when it is saving a document. This is counter to the Apple-mandated Spotlight philosophy:

     

    A Spotlight importer must run entirely without interaction. You should not attempt to present any user interface or expect that the window server is running.
    You should not expect your application to be running when your metadata importer is called. Importers can be called at any time to extract metadata from a file. Your metadata importer should be able to extract theinformation without any assistance from the application that created the file.

     

    I suppose we could break the rules, but even then it's annoying to do and would probably have bad performance.

     
    |
    Mark as:
  • Currently Being Moderated
    May 26, 2011 1:26 AM   in reply to John Hawkinson

    Hello,

     

    This some thing I would like to look at.

     

    How do you get your meta data into your documents?

     

    Thanks.

     
    |
    Mark as:
  • Currently Being Moderated
    May 26, 2011 1:30 AM   in reply to Pickory

    Ooops, sorry John, I didn't see you last post.

     

    I have been looking at the blue square stuff too.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 26, 2011 3:18 AM   in reply to Pickory

    Pickory, I can't tell if you have a question, or what it might be.

    Can you be a bit more clear?

     
    |
    Mark as:
  • Currently Being Moderated
    May 26, 2011 3:28 AM   in reply to John Hawkinson

    Hello John,

     

    My questing was how do you enter your meta data, apart from the very basic stuff.

     

    You have already answered by pointing out the spotlight plugin would not be able to index the content of the document.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 26, 2011 3:44 AM   in reply to Pickory

    I wonder if the terminoloy is confusing.

    The content of the document is not properly "metadata."

    In InDesign, you can edit and view most of the metadata with File > File Info.

    You can view the XMP metadata (different from InDesign metadata) that Spotlight has with "mdls filename" in the Terminal. Or just type "mdls " (with a space at the end) and drag an icon into the Terminal from the Finder, and hit return.

     
    |
    Mark as:
  • Currently Being Moderated
    May 26, 2011 4:00 AM   in reply to John Hawkinson

    I may be misrepresenting my ultimate goal -- most simply put, what I want to be able to do is have Spotlight (Mac OS) ability to search text within an InDesign (.indd) file. Currently Spotlight can only find a title if the searched text is in the title; e.g., if I'm looking for a résumé for Rob Zombie, and search for "Robert", I will not get a Spotlight return unless the name "Robert" is in the title of the file. However, if I've made a PDF of that InDesign document, Spotlight will find the text within the .pdf file.

     

    So what I am pining for is for, according to what I've learned about Spotlight and its capabilities, is for Adobe to write a plugin for Spotlight to be able to search, I guess, text strings within a .indd file. Currently, the only way I've heard of to search within InDesign files is a $100 to $200 program called "PageZephyr".

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 26, 2011 4:21 AM   in reply to coreworks

    coreworks, you were clear enough. It seemed like some others in this

    thread might have use cases that didn't require the full text.

     

    Oh, I suppose one option that might meet your needs but is really

    terrible would be to look through the InDesign document for anything

    htat looks like text. This would give you a lot of false positives and

    some really ugly stuff in your spotlight database.

     

    For an example, go to Terminal.app and type in

     

    strings -10 filename.indd

     

    be prepared for many many screenfulls of output (e.g. a test I just

    ran was 500 screenfuls). It certainly wouldn't be hard to tell

    spotlight that was the full text of your document. It'd probably

    enable searching for Rob Zombie, but might have other negative consequences.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 26, 2011 10:28 AM   in reply to John Hawkinson

    coreworks, you may be in luck!

     

    Doing any better requires being able to parse the INDD file format and getting text out to give to spotlight. That is not an easy task.

    It turns out it's actually nowhere near as bad as I thought.

    Maybe I'll have something to test tonight or tomorrow.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    May 28, 2011 12:46 AM   in reply to John Hawkinson

    OK.

     

    ALPHA-QUALITY SOFTWARE ALERT

     

    Download http://web.mit.edu/jhawk/tmp/InDesignImporter-0.1alpha.dmg

    and install InDesignImporter.mdimporter into ~/Library/Spotlight. (Or the /Library version if you want to live dangerously.)

     

    I'm not sure if spotlight will automatically reindex old files.

     

    This is alpha-quality software. It's prototyped as a slow and rather painful perl script that walks through the INDD file looking for things that look like strings (generally they start with @-signs), and then outputting them to Spotlight. It will produce some false positives, perhaps things like the names of styles and fonts that are encoded in your InDesign document. It will probably miss some strings. It might even crash the Spotlight importer process (mdimport).

     

    You can force a file to be indexed by typing

     

    mdimport /Users/myname/path/to/file.indd

     

    and if you add -d 1

     

    mdimport -d 1 /Users/myname/path/to/file.indd

     

    it'll tell you which spotlight importer is being used, and -d 2 will show you the metadata it finds for the file.

     

    I didn't actually bother pulling out the XMP metadata, though that's "easy." I spent most of the time bashing on the full text part.

     

    Oh, it also screws up unicode characters. In part because it excludes them as part of its heuristic for what is a string and what is not, but that's kind of messed up. Anyhow, it outputs them as literal "U+2019" for a right apostrophe.

     

    Anyhow, let me know how it works for you. Oh, yeah, it's slow. Because it wasn't really written efficiently...

     

    Oh, and it only looks for CS5 [and CS5.5] files (type IDd7). That'd be easy to change, by editing the Info.plist file.

     

    Let me know how it works. Not really sure if there's much point in making it better...easy to do though.

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Jun 1, 2011 5:59 PM   in reply to John Hawkinson

    Helooooo?

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 2, 2011 3:58 AM   in reply to John Hawkinson

    Hi!

     

    Harbs

    (Who's too swamped with work to look at this fascinating piece of software...)

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 2, 2011 6:37 AM   in reply to John Hawkinson

    Ditto what Harbs said -- thanks for doing thing, but am in the middle of a massive work flow (six separate engineering clients; working on a handful of standard forms proposals for each) and can't use something experimental at the moment. Bookmarked and will jump on it soon as I get over this hump.

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 22, 2011 10:17 AM   in reply to John Hawkinson

    Tried out your Spotlight plugin.

     

    Using: mdimport -d 1 /path/to/file.indd

     

    I get: Segmentation fault

     

    Using: mdimport -d 2 /path/to/file.indd

     

    I get: (Info) Import: Import '/Volumes/path/to/file.indd' type 'edu.mit.jhawk.adobe.indesign-document' using '/Library/Spotlight/InDesignImporter.mdimporter'

    Segmentation fault

    File doesn't seem to get indexed, as Spotlight doesn't find text within the file.
    Using Mac OSX 10.6.7 and InDesign CS5 (7.0.4)
    Rodney
     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Jun 22, 2011 2:10 PM   in reply to RKSinNC2

    Oh, finally, a tester! There seems to be some screwup where I can't

    seem to build a version that works on both 10.5 and 10.6, possibly

    relating to 64-bit, but possibly not. Which is your system, I'll

    build a version for it.

     

    Alpha quality...

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Jun 22, 2011 8:05 PM   in reply to John Hawkinson

    OK, I think it was just 10.6, not 64/32-bit. Try the version now, Get Info should called it 0.1c.

     

    Thanks for testing.

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 23, 2011 8:10 AM   in reply to John Hawkinson

    I tried the new version and it seems to work. If an InDesign file has extended text (sentences, paragraphs, etc.) they are usually indexed. With some files that only have a word or two per paragraph (I layout business forms, so I have a lot of these), it seems the only things indexed are font and color swatch names.

     

     

    Rodney

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Jun 23, 2011 5:11 PM   in reply to RKSinNC2

    OK...is that a big deal? I assume mdimport -d 2 shows that it is not finding the data you're referring to? If you want to send me the file I can look at fixing that...

     

    I would certainly like to get rid of font and swatch names, but I don't know how to distinguish them from other strings in the file. Alas...

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 24, 2011 2:05 AM   in reply to John Hawkinson

    John Hawkinson wrote:

    [..]

    I would certainly like to get rid of font and swatch names, but I don't know how to distinguish them from other strings in the file. Alas...

     

     

    About the only way is to parse the entire ID file header & Internal Object list, and filter out the plain text objects (which are scattered all over the file). The way to distinguish "text objects", by the way, from other objects such as color names, font lists, and spell check exceptions, is by comparing their object IDs to the ones defined in the SDK. And -- just to add to the phun -- there may be different definitions for different versions of ID.

     

    At that level, it's most certainly not a trivial thing to write, I can tell you that.

     
    |
    Mark as:
  • Currently Being Moderated
    Jun 29, 2011 11:35 AM   in reply to John Hawkinson

    FYI, when I use the "mdimport -d 2" command in Terminal, it may only show some of the text that is being indexed. I've discovered that words not reported by using that command are still being indexed.

     

    So your mdimporter is doing a pretty good job.

     

    Rodney

     
    |
    Mark as:
  • John Hawkinson
    5,572 posts
    Jun 25, 2009
    Currently Being Moderated
    Jun 29, 2011 12:27 PM   in reply to RKSinNC2

    That is...not what I would expect. But my Spotlight importer can only

    hand the data off to Spotlight. Whether Spotlight tells you all of

    the data it uses is a different question that I don't have control

    over, as far as I know.

     

    I got some tips about parsing the file format so I may be able to improve

    things. But not this week, I'm afraid.

     

    But if you want to see what strings it is finding, you can run the

    embedded perl script directly.

     

    ~/Library/Spotlight/InDesignImporter.mdimporter/Contents/Resources/ids trings.pl myfile.indd /dev/stdout

     

    for instance.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points