We are a small family run typesetting business using InDesign. One of our customers has asked us to
provide XML files in addition to a PDF.
We have endeavoured to find out how to do this, but it appears to be extremely complex.
Any help would be greatly appreciated
It certainly can be!
XML is a file format that can contain almost anything. And InDesign is nearly infinitely-configurable in the information it can export in XML.
So we need more information in order to help you. What kind of document is it, and what kind of information are they seeking in XML?
It's possible to tag almost anything in InDesign such that it gets emitted in the XML export. The question is how do you do it in an efficient way that doesn't waste more of your time than you need to. XML is all about automation, but sadly you may need a fair bit to get XML to be useful to your client.
Or it could be really easy.
Hopefully they told you a little mor eabout the information they need? Otherwise you'll need to go back and ask them.
At this point, saying, "We want XML" is almost like saying, "We want a File." Not quite that bad, but close. It could mean anything!
Thanks John for your quick response.
The instructions our customer has given us are as follows:
The XML will conform to an enhanced version of the NLM book version 3.0 DTD (http://dtd.nlm.nih.gov/book/tag-library/3.0/index.html)
The typesetter can typeset in whatever application suits them best.
If there happens to be any maths then it should ideally be in MathML 2.0 format although graphics will be acceptable.
Tables should follow the OASIS exchange CALS model as outlined in http://dtd.nlm.nih.gov/options/OASIS/tag-library/19990315/index.html)
We have been in the typesetting and publishing industry for more than 30 years and none of this means anything to us here. Hopefully it does to you!
Currently, our customer is converting PDFs to XML in-house and presumably their aim is to cut costs by foisting this onto the typesetter. This is understandable but we need to know if it can be done and how easily.
I'm a novice at XML, and my experience in typesetting goes back just twenty years, i.e., entirely post-lead. Still, I like to think I've picked up some notion of what constitutes good typography, and XML ain't it.
As I understand it, the whole point of XML is to separate form and content, the presentation of data from the data itself. The inevitable consequence is to enable automatic (some would say "robotic") formatting that is often antithetical to good typography. Take, for example, "The Book in the Renaissance," by Andrew Pettegree, a Christmas present from my daughter with nary a hyphen, and consequently "rivers of white." One might have expected better from a book about the period that established the standards for typography for the next half millennium. Maybe this was just a short-cut by Yale University Press to avoid stylistic differences between British and American English. But I can't help wondering whether the absence of hyphens was a symptom of valuing content over attractive form at a time when many e-book systems simply cannot handle hyphenation.
Those who value typography can move files from XML to InDesign to polish "form" and make a publication more attractive. One example was described recently by John W. Maxwell and Kathleen Fraser in "Traversing The Book of Mpub: an Agile, Web-first Publishing Model," Journal of Electronic Publishing 13.3 (December, 2010) -- check the section half-way trough on "XHTML as a Gateway to XML-based Workflow". But one gets the impression that many of those interested in XML are really much more interested in the "data" then how well it is presented.
Perhaps I should add my perspective is comes from dealing with scholarly publications on East Asian subjects, which means I regularly deal with CJK characters not yet added to Unicode and romanizations requiring diacritics that occur in very few fonts. In other words, although I used to appreciate the separation of form and content, these days I'm not so sure the distinction is so clear.
David W. Goodrich
Sorry for the delay. I had sort of hoped someone else may pop up.
David's comments are probably true, but they are not on-point here. XML is incredibly versatile, and David's comments are primarily about using XML as an input format to a typesetter. You can do this by importing XML into InDesign, or you can do this by using some other XML-based typesetting system, such as something based on XML:FO. These are not what is being asked for here.
backhand1's client wants him to export a typeset job [book?] in both PDF and XML. This is presumably so that the XML can be used to drive other automation (indexing, web sites, etc., etc.), and not for typography. So the potential limitations (or more properly...awkwardnesses) of XML-based typography aren't pertinent here. (Perhaps move to another thread for that discussion?)
Backhand1, you haven't quite given us enough information about your project, in part because you might not have it. It would help to know what kind of documents are being typeset and what level of complexity is associated with them. I assume you are using InDesign CS5?
Anyhow, though, the first step to actually doing this is probably to read over Adobe's documentation on XML stuff. Just so you have a feel for this. Using InDesign CS5 / XML.
Next, you should download and unzip the NLM Book DTD (Document Type Definition). This is a (machine-readable) specification for what kinds of XML tags are legal according to the NLM.
Then, in InDesign, open up the Structure pane (Cmd-Opt-1 in CS5, or the funky "seatbelt" icon in the lower-left in CS3). From the flyout menu, choose Load DTD and select "book3.dtd."
Now InDesign knows about all the tags that the NLM Book DTD defines. That means stuff like article-title, access-date, etc.
Next, you need to associate all your content with the appropriate tags. You could do this by hand. That would be a huge amount of work and probably impractical.
Next, you could do it by creating a mapping from paragraph and character styles to XML tags. For some kinds of documents, that is basically all you need to do. But for others, you might have to a lot of extra work. Either fixing up the tags by hand, or fixing up the styles by hand.
Thirdly, you could change the way you import data into InDesign, up to and including using an XML-based input format. Then you start to get into the issues David highlighted. But maybe just being more rigorous in how styles are applied in Microsoft Word, etc. Maybe you're laying out files that customer is supplying to you? If so, you could require the customer to submit the files to you with an NLN Book DTD -compliant tagging?
There are probably other solutions you could use, too...
Hopefully this is the least bit helpful...
I guess I did not explicitly state my underlying point that much of what used to be typesetting has become file conversion, instead distracting myself with the notion that these days the niches where one can apply traditional typographic skills seem to be decreasing. The trajectory of InDesign's development and the postings on this forum both point to the growing importance of electronic publication. I cited the recent Journal of Electronic Publishing piece in part because I found it a handy summary of some significant issues involved, but also because it links to a downloadable script for converting XHTML to IDML. I think that John and I agree that anyone needing to end up with XML should consider introducing XML early on.
Publishing continues its rapid rate of change, XML's role is only going to increase, and it would be wise to anticipate the consequences.
Further to backhand's query, I am trying to make ready InDesign journal templates forXML file production based on the NLM DTD. I have loaded the book3.dtd (as suggested here) but the elements in their list bear no resemblence to the paragraph, character and object styles we use. Can anyone translate them for me? When they say 'container for' do they mean textboxes (object styles)? How do you distinguish betwen paragraph and character styles in their list? How do you tag several different paragraph styles when NLM offers only one, 'body'? I could go on... Thanks in advance
FIve: As I tried to suggest to backhand1, you're probably in for a world of hurt.
I have loaded the book3.dtd (as suggested here) but the elements in their list bear no resemblence to the paragraph, character and object styles we use. Can anyone translate them for me? When they say 'container for' do they mean textboxes (object styles)? How do you distinguish betwen paragraph and character styles in their list? How do you tag several different paragraph styles when NLM offers only one, 'body'?
I'm not sure where you're seeing "container" elements in the NLM DTD. Also, "body" doesn't really seem to equate to paragraph styles. For instance, on the page for <body> frame, http://dtd.nlm.nih.gov/book/tag-library/n-pc30.html, here's the example they give:
<book> <book-meta>...</book-meta> <book-front>...</book-front> <body> <book-part id="bid.2" book-part-type="chapter" book-part-number="1"> <book-part-meta>...</book-part-meta> <body> <sec id="bid.3"> <title>History</title> <p>Initially, GenBank was built and maintained at Los Alamos National Laboratory (<xref ref-type="kwd" rid="bid.41">LANL</xref>). In the early 1990s, this responsibility was awarded to NCBI through ...</p> </sec> <sec id="bid.4"> <title>International Collaboration</title> <p>In the mid-1990s, the GenBank database became part of the International Nucleotide Sequence Database Collaboration with the EMBL database ...</p> </sec> </body> <back>...</back> </book-part> </body> </book>
The baseline paragraph tag is <p>.
How are you interested in changing the paragraph style? As a function of the <sec> tag, perhaps? If so, you'll need to transform the XML such that what is tagged with <p> is instead tagged with somethign else, on the basis of ancestor elements. That's something you could do with XSLT.
But please give a concrete example of what you're trying to do.
Europe, Middle East and Africa