29 Replies Latest reply on Feb 11, 2016 2:23 AM by filippomazzocchi

    Spotlight Meta-Data Importer


      For the Macintosh platform, I'd like Adobe to supply or make available a Spotlight Meta-Data Importer that would expose the XMP Meta-Data to Spotlight searches.

        • 1. Re: Spotlight Meta-Data Importer
          marcopperman Level 1

          Why, after FIVE YEARS (the time since people started asking for one), is there no Spotlight plugin for Indesign files??

          • 2. Re: Spotlight Meta-Data Importer

            Wholeheartedly agree. I've had a few recommend something called PageZephyr -- but it's a hundred to several hundred dollars. I would really like to see Adobe get rolling and get this arguably simple thing added to InDesign; the ability to search for text within .indd files without *having* to keep PDFs of each one around would save me a lot of time (and more than a decent amount of drive space).

            • 3. Re: Spotlight Meta-Data Importer
              John Hawkinson Level 5

              InDesign files have XMP metadata already. Is this really not just a matter of configuration?

              • 4. Re: Spotlight Meta-Data Importer
                coreworks Level 1

                If the metadata is there, how does one get Spotlight to search within .indd files? If I remember the discussions I've read on the matter over the past half decade, it's up to Adobe to write one, and they haven't as of this writing (insofar as I know, unless it's part of CS5 -- I've not upgraded to CS5 yet).

                • 5. Re: Spotlight Meta-Data Importer
                  John Hawkinson Level 5

                  OK, so, I looked into this in a bit more detail.


                  It's up to someone to write, but it need not be Adobe.


                  Spotlight ships with ~20 importer each of which declares a set of Uniform Type Identifiers (UTIs) upon which it operates. And each plugin importer gets called by Spotlight when a new file of it's UTI type appears, and the plugin exports the metadata to Spotlight's metadata database.


                  It would appear that none of the standard spotlight plugins simply read the provided file and look for XMP data. Though probably some of them use some dispatch mechanism and then do so (like Image).


                  There are examples where people solve this problem for movie files by editing the Info.plist associated with the Quicktime mdimporter to add more UTIs. This does not work to add the INDD UTI to Image.mdimporter or Quicktime.mdimporter, but perhaps it's a bit close. It ought not be difficult to merge the sample XMP reading library from the XMP toolkit with the sample Spotlight plugin to make this work. But it is a bit of developer effort.


                  Also, more annoyingly, InDesign does not define a standard UTI for INDD documents, so unlike a Photoshop file that is com.adobe.photoshop-image, an InDesign file is, on my system, dyn.ah62d4rv4ge80w5xequ. This is circumventable, but annoying.



                  • 6. Re: Spotlight Meta-Data Importer
                    John Hawkinson Level 5

                    OK, I was wrong.


                    Or if not technically wrong, fairly misleading.

                    A spent a while mucking around and have most of a spotlight importer that reads XMP metadata from arbitrary files using Adobe's XMP toolkit, and can write to the Spotlight database, and it gets invoked at the right times., etc., etc. [It doesn't actually do the writing, but its not much effort to make it do so.]


                    This addresses ascript_guy's desire from 2009:


                    For the Macintosh platform, I'd like Adobe to supply or make available a Spotlight Meta-Data Importer that would expose the XMP Meta-Data to Spotlight searches.

                    Unfortunately, this is not very useful, because the XMP metadata isn't very useful. As coreworks, points out the real utility comes from being able to search the text content of InDesign documents (though I'm not really sure how Spotlight deals with book-length stuff... But the XMP metadata doesn't have that. It doesn't even have a proxy or summary or short piece of that.


                    Here's a sampling of the metadata from Blue Square.indd, a sample InDesign layout that ships with the XMP SDK:

                    x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2">
                       <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                          <rdf:Description rdf:about=""
                                   <rdf:li rdf:parseType="Resource">
                                      <stEvt:softwareAgent>Adobe InDesign 7.0</stEvt:softwareAgent>
                                   <rdf:li rdf:parseType="Resource">
                                      <stEvt:softwareAgent>Adobe InDesign 7.0</stEvt:softwareAgent>
                          <rdf:Description rdf:about=""
                             <xmp:CreatorTool>Adobe InDesign 7.0</xmp:CreatorTool>
                                   <rdf:li rdf:parseType="Resource">
                          <rdf:Description rdf:about=""
                                   <rdf:li xml:lang="x-default">Blue Square Test File - .indd</rdf:li>
                                   <rdf:li xml:lang="x-default">XMPFiles BlueSquare test file, created in InDesign 
                    CS2, saved as .indd and .pdf.</rdf:li>
                                   <rdf:li>Blue Square</rdf:li>
                                   <rdf:li>test file</rdf:li>
                          <rdf:Description rdf:about=""
                                   <rdf:li rdf:parseType="Resource">
                                   <rdf:li rdf:parseType="Resource">


                    So, if you happen to fill in the Author/Title/Keyword metadata in ID, then maybe that is useful. Or if the thumbnail image of the file is useful.

                    Any of it could get stuffed into the spotlight data, but to what end?


                    Doing any better requires being able to parse the INDD file format and getting text out to give to spotlight. That is not an easy task.

                    Markzware specializes in tools that understand the insides of InDesign (and Quark) files, so it's not surprising that they have a product for this (PageZephyr). And really, $100 doesn't seem too much to pay.


                    So, blah.


                    I suppose one could take the thumbnail and make a QuickLook generator out of it.

                    If someone wants to pull XMP metadata out of some other kind of file that's more useful and would like me to finish the spotlight plugin, you should let me know. Otherwise I don't see bothering.


                    Oh, I suppose another option is to use the SDK or Scripting API to have InDesign pull out the text when it is saving a document. This is counter to the Apple-mandated Spotlight philosophy:


                    A Spotlight importer must run entirely without interaction. You should not attempt to present any user interface or expect that the window server is running.
                    You should not expect your application to be running when your metadata importer is called. Importers can be called at any time to extract metadata from a file. Your metadata importer should be able to extract theinformation without any assistance from the application that created the file.


                    I suppose we could break the rules, but even then it's annoying to do and would probably have bad performance.

                    • 7. Re: Spotlight Meta-Data Importer
                      Pickory Level 3



                      This some thing I would like to look at.


                      How do you get your meta data into your documents?



                      • 8. Re: Spotlight Meta-Data Importer
                        Pickory Level 3

                        Ooops, sorry John, I didn't see you last post.


                        I have been looking at the blue square stuff too.

                        • 9. Re: Spotlight Meta-Data Importer
                          John Hawkinson Level 5

                          Pickory, I can't tell if you have a question, or what it might be.

                          Can you be a bit more clear?

                          • 10. Re: Spotlight Meta-Data Importer
                            Pickory Level 3

                            Hello John,


                            My questing was how do you enter your meta data, apart from the very basic stuff.


                            You have already answered by pointing out the spotlight plugin would not be able to index the content of the document.

                            • 11. Re: Spotlight Meta-Data Importer
                              John Hawkinson Level 5

                              I wonder if the terminoloy is confusing.

                              The content of the document is not properly "metadata."

                              In InDesign, you can edit and view most of the metadata with File > File Info.

                              You can view the XMP metadata (different from InDesign metadata) that Spotlight has with "mdls filename" in the Terminal. Or just type "mdls " (with a space at the end) and drag an icon into the Terminal from the Finder, and hit return.

                              • 12. Re: Spotlight Meta-Data Importer
                                coreworks Level 1

                                I may be misrepresenting my ultimate goal -- most simply put, what I want to be able to do is have Spotlight (Mac OS) ability to search text within an InDesign (.indd) file. Currently Spotlight can only find a title if the searched text is in the title; e.g., if I'm looking for a résumé for Rob Zombie, and search for "Robert", I will not get a Spotlight return unless the name "Robert" is in the title of the file. However, if I've made a PDF of that InDesign document, Spotlight will find the text within the .pdf file.


                                So what I am pining for is for, according to what I've learned about Spotlight and its capabilities, is for Adobe to write a plugin for Spotlight to be able to search, I guess, text strings within a .indd file. Currently, the only way I've heard of to search within InDesign files is a $100 to $200 program called "PageZephyr".

                                • 13. Re: Spotlight Meta-Data Importer
                                  John Hawkinson Level 5

                                  coreworks, you were clear enough. It seemed like some others in this

                                  thread might have use cases that didn't require the full text.


                                  Oh, I suppose one option that might meet your needs but is really

                                  terrible would be to look through the InDesign document for anything

                                  htat looks like text. This would give you a lot of false positives and

                                  some really ugly stuff in your spotlight database.


                                  For an example, go to Terminal.app and type in


                                  strings -10 filename.indd


                                  be prepared for many many screenfulls of output (e.g. a test I just

                                  ran was 500 screenfuls). It certainly wouldn't be hard to tell

                                  spotlight that was the full text of your document. It'd probably

                                  enable searching for Rob Zombie, but might have other negative consequences.

                                  • 14. Re: Spotlight Meta-Data Importer
                                    John Hawkinson Level 5

                                    coreworks, you may be in luck!


                                    Doing any better requires being able to parse the INDD file format and getting text out to give to spotlight. That is not an easy task.

                                    It turns out it's actually nowhere near as bad as I thought.

                                    Maybe I'll have something to test tonight or tomorrow.

                                    • 15. Re: Spotlight Meta-Data Importer
                                      John Hawkinson Level 5



                                      ALPHA-QUALITY SOFTWARE ALERT


                                      Download http://web.mit.edu/jhawk/tmp/InDesignImporter-0.1alpha.dmg

                                      and install InDesignImporter.mdimporter into ~/Library/Spotlight. (Or the /Library version if you want to live dangerously.)


                                      I'm not sure if spotlight will automatically reindex old files.


                                      This is alpha-quality software. It's prototyped as a slow and rather painful perl script that walks through the INDD file looking for things that look like strings (generally they start with @-signs), and then outputting them to Spotlight. It will produce some false positives, perhaps things like the names of styles and fonts that are encoded in your InDesign document. It will probably miss some strings. It might even crash the Spotlight importer process (mdimport).


                                      You can force a file to be indexed by typing


                                      mdimport /Users/myname/path/to/file.indd


                                      and if you add -d 1


                                      mdimport -d 1 /Users/myname/path/to/file.indd


                                      it'll tell you which spotlight importer is being used, and -d 2 will show you the metadata it finds for the file.


                                      I didn't actually bother pulling out the XMP metadata, though that's "easy." I spent most of the time bashing on the full text part.


                                      Oh, it also screws up unicode characters. In part because it excludes them as part of its heuristic for what is a string and what is not, but that's kind of messed up. Anyhow, it outputs them as literal "U+2019" for a right apostrophe.


                                      Anyhow, let me know how it works for you. Oh, yeah, it's slow. Because it wasn't really written efficiently...


                                      Oh, and it only looks for CS5 [and CS5.5] files (type IDd7). That'd be easy to change, by editing the Info.plist file.


                                      Let me know how it works. Not really sure if there's much point in making it better...easy to do though.

                                      • 17. Re: Spotlight Meta-Data Importer
                                        Harbs. Level 6




                                        (Who's too swamped with work to look at this fascinating piece of software...)

                                        • 18. Re: Spotlight Meta-Data Importer
                                          coreworks Level 1

                                          Ditto what Harbs said -- thanks for doing thing, but am in the middle of a massive work flow (six separate engineering clients; working on a handful of standard forms proposals for each) and can't use something experimental at the moment. Bookmarked and will jump on it soon as I get over this hump.

                                          • 19. Re: Spotlight Meta-Data Importer

                                            Tried out your Spotlight plugin.


                                            Using: mdimport -d 1 /path/to/file.indd


                                            I get: Segmentation fault


                                            Using: mdimport -d 2 /path/to/file.indd


                                            I get: (Info) Import: Import '/Volumes/path/to/file.indd' type 'edu.mit.jhawk.adobe.indesign-document' using '/Library/Spotlight/InDesignImporter.mdimporter'

                                            Segmentation fault

                                            File doesn't seem to get indexed, as Spotlight doesn't find text within the file.
                                            Using Mac OSX 10.6.7 and InDesign CS5 (7.0.4)
                                            • 20. Re: Spotlight Meta-Data Importer
                                              John Hawkinson Level 5

                                              Oh, finally, a tester! There seems to be some screwup where I can't

                                              seem to build a version that works on both 10.5 and 10.6, possibly

                                              relating to 64-bit, but possibly not. Which is your system, I'll

                                              build a version for it.


                                              Alpha quality...

                                              • 21. Re: Spotlight Meta-Data Importer
                                                John Hawkinson Level 5

                                                OK, I think it was just 10.6, not 64/32-bit. Try the version now, Get Info should called it 0.1c.


                                                Thanks for testing.

                                                • 22. Re: Spotlight Meta-Data Importer
                                                  RKSinNC2 Level 1

                                                  I tried the new version and it seems to work. If an InDesign file has extended text (sentences, paragraphs, etc.) they are usually indexed. With some files that only have a word or two per paragraph (I layout business forms, so I have a lot of these), it seems the only things indexed are font and color swatch names.




                                                  • 23. Re: Spotlight Meta-Data Importer
                                                    John Hawkinson Level 5

                                                    OK...is that a big deal? I assume mdimport -d 2 shows that it is not finding the data you're referring to? If you want to send me the file I can look at fixing that...


                                                    I would certainly like to get rid of font and swatch names, but I don't know how to distinguish them from other strings in the file. Alas...

                                                    • 24. Re: Spotlight Meta-Data Importer
                                                      [Jongware] Most Valuable Participant

                                                      John Hawkinson wrote:


                                                      I would certainly like to get rid of font and swatch names, but I don't know how to distinguish them from other strings in the file. Alas...



                                                      About the only way is to parse the entire ID file header & Internal Object list, and filter out the plain text objects (which are scattered all over the file). The way to distinguish "text objects", by the way, from other objects such as color names, font lists, and spell check exceptions, is by comparing their object IDs to the ones defined in the SDK. And -- just to add to the phun -- there may be different definitions for different versions of ID.


                                                      At that level, it's most certainly not a trivial thing to write, I can tell you that.

                                                      • 25. Re: Spotlight Meta-Data Importer
                                                        RKSinNC2 Level 1

                                                        FYI, when I use the "mdimport -d 2" command in Terminal, it may only show some of the text that is being indexed. I've discovered that words not reported by using that command are still being indexed.


                                                        So your mdimporter is doing a pretty good job.



                                                        • 26. Re: Spotlight Meta-Data Importer
                                                          John Hawkinson Level 5

                                                          That is...not what I would expect. But my Spotlight importer can only

                                                          hand the data off to Spotlight. Whether Spotlight tells you all of

                                                          the data it uses is a different question that I don't have control

                                                          over, as far as I know.


                                                          I got some tips about parsing the file format so I may be able to improve

                                                          things. But not this week, I'm afraid.


                                                          But if you want to see what strings it is finding, you can run the

                                                          embedded perl script directly.


                                                          ~/Library/Spotlight/InDesignImporter.mdimporter/Contents/Resources/idstrings.pl myfile.indd /dev/stdout


                                                          for instance.

                                                          • 27. Re: Spotlight Meta-Data Importer
                                                            darrenoia Level 1

                                                            John, a very belated response — apparently I installed this plugin a while ago and it worked so well I didn't even remember I had. Kudos for great work. Somehow it's stopped working for me but I'm trying to reinstall now.

                                                            • 28. Re: Spotlight Meta-Data Importer
                                                              KnightD Level 1

                                                              Sorry I'm late to the party! John I think you may see more interest in this script as PageZephyr, while it was a great piece of software for InDesign documents up to CC, is no longer supported and will not read files created using CC2014-CC2015.

                                                              I searched high and low for an alternative and finally stumbled upon this thread. There is demand I believe, but this being the only other possible solution I found is not visible enough. I'd certainly be willing to do some testing on this if you're up for seeing this work with CC2015.


                                                              I'm decent with extend script, but I don't know perl.


                                                              Right now my terminal output is:

                                                              mdimport[1267:193377] c++ exception of type ecpUnsupportedVersion. lineno = 145. in N3com9markzware9nmsReader12nmsInDesign317cInDesignDocTask3E. path:/Users/tetrisbox/Desktop/SpotlightTest/NewSpotlightTester.indd.

                                                              It looks like there is a conflict with having PageZephyr still installed.

                                                              • 29. Re: Spotlight Meta-Data Importer

                                                                Hello John,

                                                                an anachronistic question, I think (it is 2016)! 

                                                                You stopped development of your spotlight plugin for indesign? You wrote an updated version Yosemite (or higher) for indesin CS6 (or CC2015)?

                                                                Best regard