Wholeheartedly agree. I've had a few recommend something called PageZephyr -- but it's a hundred to several hundred dollars. I would really like to see Adobe get rolling and get this arguably simple thing added to InDesign; the ability to search for text within .indd files without *having* to keep PDFs of each one around would save me a lot of time (and more than a decent amount of drive space).
If the metadata is there, how does one get Spotlight to search within .indd files? If I remember the discussions I've read on the matter over the past half decade, it's up to Adobe to write one, and they haven't as of this writing (insofar as I know, unless it's part of CS5 -- I've not upgraded to CS5 yet).
OK, so, I looked into this in a bit more detail.
It's up to someone to write, but it need not be Adobe.
Spotlight ships with ~20 importer each of which declares a set of Uniform Type Identifiers (UTIs) upon which it operates. And each plugin importer gets called by Spotlight when a new file of it's UTI type appears, and the plugin exports the metadata to Spotlight's metadata database.
It would appear that none of the standard spotlight plugins simply read the provided file and look for XMP data. Though probably some of them use some dispatch mechanism and then do so (like Image).
There are examples where people solve this problem for movie files by editing the Info.plist associated with the Quicktime mdimporter to add more UTIs. This does not work to add the INDD UTI to Image.mdimporter or Quicktime.mdimporter, but perhaps it's a bit close. It ought not be difficult to merge the sample XMP reading library from the XMP toolkit with the sample Spotlight plugin to make this work. But it is a bit of developer effort.
Also, more annoyingly, InDesign does not define a standard UTI for INDD documents, so unlike a Photoshop file that is com.adobe.photoshop-image, an InDesign file is, on my system, dyn.ah62d4rv4ge80w5xequ. This is circumventable, but annoying.
OK, I was wrong.
Or if not technically wrong, fairly misleading.
A spent a while mucking around and have most of a spotlight importer that reads XMP metadata from arbitrary files using Adobe's XMP toolkit, and can write to the Spotlight database, and it gets invoked at the right times., etc., etc. [It doesn't actually do the writing, but its not much effort to make it do so.]
This addresses ascript_guy's desire from 2009:
For the Macintosh platform, I'd like Adobe to supply or make available a Spotlight Meta-Data Importer that would expose the XMP Meta-Data to Spotlight searches.
Unfortunately, this is not very useful, because the XMP metadata isn't very useful. As coreworks, points out the real utility comes from being able to search the text content of InDesign documents (though I'm not really sure how Spotlight deals with book-length stuff... But the XMP metadata doesn't have that. It doesn't even have a proxy or summary or short piece of that.
Here's a sampling of the metadata from Blue Square.indd, a sample InDesign layout that ships with the XMP SDK:
x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.1.2"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"> <xmpMM:InstanceID>xmp.iid:0E49B4AD072068118C1493EC1DFA0B63</xmpMM:InstanceID> <xmpMM:DocumentID>adobe:docid:indd:af62ddc6-213f-11da-ac18-fe020baf4f13</xmpMM:Documen tID> <xmpMM:OriginalDocumentID>adobe:docid:indd:af62ddc6-213f-11da-ac18-fe020baf4f13</xmpMM :OriginalDocumentID> <xmpMM:History> <rdf:Seq> <rdf:li rdf:parseType="Resource"> <stEvt:action>saved</stEvt:action> <stEvt:instanceID>xmp.iid:0D49B4AD072068118C1493EC1DFA0B63</stEvt:instanceID> <stEvt:when>2011-05-25T06:51:10-04:00</stEvt:when> <stEvt:softwareAgent>Adobe InDesign 7.0</stEvt:softwareAgent> <stEvt:changed>/;/metadata</stEvt:changed> </rdf:li> <rdf:li rdf:parseType="Resource"> <stEvt:action>saved</stEvt:action> <stEvt:instanceID>xmp.iid:0E49B4AD072068118C1493EC1DFA0B63</stEvt:instanceID> <stEvt:when>2011-05-25T06:51:10-04:00</stEvt:when> <stEvt:softwareAgent>Adobe InDesign 7.0</stEvt:softwareAgent> <stEvt:changed>/metadata</stEvt:changed> </rdf:li> </rdf:Seq> </xmpMM:History> </rdf:Description> <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:xmpTPg="http://ns.adobe.com/xap/1.0/t/pg/" xmlns:xmpGImg="http://ns.adobe.com/xap/1.0/g/img/"> <xmp:CreateDate>2005-09-07T14:40:37Z</xmp:CreateDate> <xmp:ModifyDate>2011-05-25T06:51:10-04:00</xmp:ModifyDate> <xmp:MetadataDate>2011-05-25T06:51:10-04:00</xmp:MetadataDate> <xmp:CreatorTool>Adobe InDesign 7.0</xmp:CreatorTool> <xmp:PageInfo> <rdf:Seq> <rdf:li rdf:parseType="Resource"> <xmpTPg:PageNumber>1</xmpTPg:PageNumber> <xmpGImg:format>JPEG</xmpGImg:format> <xmpGImg:width>256</xmpGImg:width> <xmpGImg:height>256</xmpGImg:height> <xmpGImg:image>/9j/4AAQSkZJRgABAgEASABIAAD/7QAsUGhvdG9zaG9wIDMuMAA4QklNA+0AAA ... </xmpGImg:image> </rdf:li> </rdf:Seq> </xmp:PageInfo> </rdf:Description> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:format>application/x-indesign</dc:format> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">Blue Square Test File - .indd</rdf:li> </rdf:Alt> </dc:title> <dc:description> <rdf:Alt> <rdf:li xml:lang="x-default">XMPFiles BlueSquare test file, created in InDesign CS2, saved as .indd and .pdf.</rdf:li> </rdf:Alt> </dc:description> <dc:subject> <rdf:Bag> <rdf:li>XMP</rdf:li> <rdf:li>Blue Square</rdf:li> <rdf:li>test file</rdf:li> <rdf:li>InDesign</rdf:li> <rdf:li>.indd</rdf:li> </rdf:Bag> </dc:subject> </rdf:Description> <rdf:Description rdf:about="" xmlns:xmpTPg="http://ns.adobe.com/xap/1.0/t/pg/" xmlns:xmpG="http://ns.adobe.com/xap/1.0/g/" xmlns:stFnt="http://ns.adobe.com/xap/1.0/sType/Font#"> <xmpTPg:Colorants> <rdf:Seq> <rdf:li rdf:parseType="Resource"> <xmpG:swatchName>Black</xmpG:swatchName> <xmpG:mode>CMYK</xmpG:mode> <xmpG:type>Process</xmpG:type> <xmpG:cyan>0</xmpG:cyan> <xmpG:magenta>0</xmpG:magenta> <xmpG:yellow>0</xmpG:yellow> <xmpG:black>100</xmpG:black> </rdf:li> </xmpTPg:Colorants> <xmpTPg:Fonts> <rdf:Bag> <rdf:li rdf:parseType="Resource"> <stFnt:fontName>Times-Roman</stFnt:fontName> <stFnt:fontFamily>Times</stFnt:fontFamily> <stFnt:fontFace>Regular</stFnt:fontFace> <stFnt:fontType>TrueType</stFnt:fontType> <stFnt:versionString>Times-Roman6.0d6e5</stFnt:versionString> <stFnt:composite>false</stFnt:composite> <stFnt:fontFileName>Times.dfont</stFnt:fontFileName> </rdf:li> </rdf:Bag> </xmpTPg:Fonts> </rdf:Description> </rdf:RDF> </x:xmpmeta>
So, if you happen to fill in the Author/Title/Keyword metadata in ID, then maybe that is useful. Or if the thumbnail image of the file is useful.
Any of it could get stuffed into the spotlight data, but to what end?
Doing any better requires being able to parse the INDD file format and getting text out to give to spotlight. That is not an easy task.
Markzware specializes in tools that understand the insides of InDesign (and Quark) files, so it's not surprising that they have a product for this (PageZephyr). And really, $100 doesn't seem too much to pay.
I suppose one could take the thumbnail and make a QuickLook generator out of it.
If someone wants to pull XMP metadata out of some other kind of file that's more useful and would like me to finish the spotlight plugin, you should let me know. Otherwise I don't see bothering.
Oh, I suppose another option is to use the SDK or Scripting API to have InDesign pull out the text when it is saving a document. This is counter to the Apple-mandated Spotlight philosophy:
A Spotlight importer must run entirely without interaction. You should not attempt to present any user interface or expect that the window server is running.
You should not expect your application to be running when your metadata importer is called. Importers can be called at any time to extract metadata from a file. Your metadata importer should be able to extract theinformation without any assistance from the application that created the file.
I suppose we could break the rules, but even then it's annoying to do and would probably have bad performance.
I wonder if the terminoloy is confusing.
The content of the document is not properly "metadata."
In InDesign, you can edit and view most of the metadata with File > File Info.
You can view the XMP metadata (different from InDesign metadata) that Spotlight has with "mdls filename" in the Terminal. Or just type "mdls " (with a space at the end) and drag an icon into the Terminal from the Finder, and hit return.
I may be misrepresenting my ultimate goal -- most simply put, what I want to be able to do is have Spotlight (Mac OS) ability to search text within an InDesign (.indd) file. Currently Spotlight can only find a title if the searched text is in the title; e.g., if I'm looking for a résumé for Rob Zombie, and search for "Robert", I will not get a Spotlight return unless the name "Robert" is in the title of the file. However, if I've made a PDF of that InDesign document, Spotlight will find the text within the .pdf file.
So what I am pining for is for, according to what I've learned about Spotlight and its capabilities, is for Adobe to write a plugin for Spotlight to be able to search, I guess, text strings within a .indd file. Currently, the only way I've heard of to search within InDesign files is a $100 to $200 program called "PageZephyr".
coreworks, you were clear enough. It seemed like some others in this
thread might have use cases that didn't require the full text.
Oh, I suppose one option that might meet your needs but is really
terrible would be to look through the InDesign document for anything
htat looks like text. This would give you a lot of false positives and
some really ugly stuff in your spotlight database.
For an example, go to Terminal.app and type in
strings -10 filename.indd
be prepared for many many screenfulls of output (e.g. a test I just
ran was 500 screenfuls). It certainly wouldn't be hard to tell
spotlight that was the full text of your document. It'd probably
enable searching for Rob Zombie, but might have other negative consequences.
ALPHA-QUALITY SOFTWARE ALERT
and install InDesignImporter.mdimporter into ~/Library/Spotlight. (Or the /Library version if you want to live dangerously.)
I'm not sure if spotlight will automatically reindex old files.
This is alpha-quality software. It's prototyped as a slow and rather painful perl script that walks through the INDD file looking for things that look like strings (generally they start with @-signs), and then outputting them to Spotlight. It will produce some false positives, perhaps things like the names of styles and fonts that are encoded in your InDesign document. It will probably miss some strings. It might even crash the Spotlight importer process (mdimport).
You can force a file to be indexed by typing
and if you add -d 1
mdimport -d 1 /Users/myname/path/to/file.indd
it'll tell you which spotlight importer is being used, and -d 2 will show you the metadata it finds for the file.
I didn't actually bother pulling out the XMP metadata, though that's "easy." I spent most of the time bashing on the full text part.
Oh, it also screws up unicode characters. In part because it excludes them as part of its heuristic for what is a string and what is not, but that's kind of messed up. Anyhow, it outputs them as literal "U+2019" for a right apostrophe.
Anyhow, let me know how it works for you. Oh, yeah, it's slow. Because it wasn't really written efficiently...
Oh, and it only looks for CS5 [and CS5.5] files (type IDd7). That'd be easy to change, by editing the Info.plist file.
Let me know how it works. Not really sure if there's much point in making it better...easy to do though.
Tried out your Spotlight plugin.
Using: mdimport -d 1 /path/to/file.indd
I get: Segmentation fault
Using: mdimport -d 2 /path/to/file.indd
I get: (Info) Import: Import '/Volumes/path/to/file.indd' type 'edu.mit.jhawk.adobe.indesign-document' using '/Library/Spotlight/InDesignImporter.mdimporter'
Segmentation faultFile doesn't seem to get indexed, as Spotlight doesn't find text within the file.Using Mac OSX 10.6.7 and InDesign CS5 (7.0.4)Rodney
I tried the new version and it seems to work. If an InDesign file has extended text (sentences, paragraphs, etc.) they are usually indexed. With some files that only have a word or two per paragraph (I layout business forms, so I have a lot of these), it seems the only things indexed are font and color swatch names.
OK...is that a big deal? I assume mdimport -d 2 shows that it is not finding the data you're referring to? If you want to send me the file I can look at fixing that...
I would certainly like to get rid of font and swatch names, but I don't know how to distinguish them from other strings in the file. Alas...
John Hawkinson wrote:
I would certainly like to get rid of font and swatch names, but I don't know how to distinguish them from other strings in the file. Alas...
About the only way is to parse the entire ID file header & Internal Object list, and filter out the plain text objects (which are scattered all over the file). The way to distinguish "text objects", by the way, from other objects such as color names, font lists, and spell check exceptions, is by comparing their object IDs to the ones defined in the SDK. And -- just to add to the phun -- there may be different definitions for different versions of ID.
At that level, it's most certainly not a trivial thing to write, I can tell you that.
That is...not what I would expect. But my Spotlight importer can only
hand the data off to Spotlight. Whether Spotlight tells you all of
the data it uses is a different question that I don't have control
over, as far as I know.
I got some tips about parsing the file format so I may be able to improve
things. But not this week, I'm afraid.
But if you want to see what strings it is finding, you can run the
embedded perl script directly.
~/Library/Spotlight/InDesignImporter.mdimporter/Contents/Resources/idstrings.pl myfile.indd /dev/stdout
Sorry I'm late to the party! John I think you may see more interest in this script as PageZephyr, while it was a great piece of software for InDesign documents up to CC, is no longer supported and will not read files created using CC2014-CC2015.
I searched high and low for an alternative and finally stumbled upon this thread. There is demand I believe, but this being the only other possible solution I found is not visible enough. I'd certainly be willing to do some testing on this if you're up for seeing this work with CC2015.
I'm decent with extend script, but I don't know perl.
Right now my terminal output is:
mdimport[1267:193377] c++ exception of type ecpUnsupportedVersion. lineno = 145. in N3com9markzware9nmsReader12nmsInDesign317cInDesignDocTask3E. path:/Users/tetrisbox/Desktop/SpotlightTest/NewSpotlightTester.indd.
It looks like there is a conflict with having PageZephyr still installed.