Skip navigation
backhand1
Currently Being Moderated

XML File creation

Feb 26, 2011 2:33 AM

We are a small family run typesetting business using InDesign. One of our customers has asked us to 

provide XML files in addition to a PDF.

We have endeavoured to find out how to do this, but it appears to be extremely complex.

Any help would be greatly appreciated

 
Replies
  • John Hawkinson
    5,512 posts
    Jun 25, 2009
    Currently Being Moderated
    Feb 26, 2011 2:39 AM   in reply to backhand1

    It certainly can be!

     

    XML is a file format that can contain almost anything. And InDesign is nearly infinitely-configurable in the information it can export in XML.

     

    So we need more information in order to help you. What kind of document is it, and what kind of information are they seeking in XML?

     

    It's possible to tag almost anything in InDesign such that it gets emitted in the XML export. The question is how do you do it in an efficient way that doesn't waste more of your time than you need to. XML is all about automation, but sadly you may need a fair bit to get XML to be useful to your client.

     

    Or it could be really easy.

     

    Hopefully they told you a little mor eabout the information they need? Otherwise you'll need to go back and ask them.

     

    At this point, saying, "We want XML" is almost like saying, "We want a File." Not quite that bad, but close. It could mean anything!

     
    |
    Mark as:
  • Currently Being Moderated
    Feb 27, 2011 5:33 PM   in reply to backhand1

    I'm a novice at XML, and my experience in typesetting goes back just twenty years, i.e., entirely post-lead.  Still, I like  to think I've picked up some notion of what constitutes good typography, and XML ain't it.

     

    As I understand it, the whole point of XML is to separate form and content, the presentation of data from the data itself.  The inevitable consequence is to enable automatic (some would say "robotic") formatting that is often antithetical to good typography.  Take, for example, "The Book in the Renaissance," by Andrew Pettegree, a Christmas present from my daughter with nary a hyphen, and consequently "rivers of white."  One might have expected better from a book about the period that established the standards for typography for the next half millennium.  Maybe this was just a short-cut by Yale University Press to avoid stylistic differences between British and American English.  But I can't help wondering whether the absence of hyphens was a symptom of valuing content over attractive form at a time when many e-book systems simply cannot handle hyphenation.

     

    Those who value typography can move files from XML to InDesign to polish "form" and make a publication more attractive.  One example was described recently by John W. Maxwell and Kathleen Fraser in "Traversing The Book of Mpub: an Agile, Web-first Publishing Model," Journal of Electronic Publishing 13.3 (December, 2010)  -- check the section half-way trough on "XHTML as a Gateway to XML-based Workflow".  But one gets the impression that many of those interested in XML are really much more interested in the "data" then how well it is presented.

     

    Perhaps I should add my perspective is comes from dealing with scholarly publications on East Asian subjects, which means I regularly deal with CJK characters not yet added to Unicode and romanizations requiring diacritics that occur in very few fonts.  In other words, although I used to appreciate the separation of form and content, these days I'm not so sure the distinction is so clear.

     

    David W. Goodrich

     
    |
    Mark as:
  • John Hawkinson
    5,512 posts
    Jun 25, 2009
    Currently Being Moderated
    Feb 28, 2011 12:00 AM   in reply to backhand1

    Sorry for the delay. I had sort of hoped someone else may pop up.

     

    David's comments are probably true, but they are not on-point here. XML is incredibly versatile, and David's comments are primarily about using XML as an input format to a typesetter. You can do this by importing XML into InDesign, or you can do this by using some other XML-based typesetting system, such as something based on XML:FO. These are not what is being asked for here.

     

    backhand1's client wants him to export a typeset job [book?] in both PDF and XML. This is presumably so that the XML can be used to drive other automation (indexing, web sites, etc., etc.), and not for typography. So the potential limitations (or more properly...awkwardnesses) of XML-based typography aren't pertinent here. (Perhaps move to another thread for that discussion?)

     

    Backhand1, you haven't quite given us enough information about your project, in part because you might not have it. It would help to know what kind of documents are being typeset and what level of complexity is associated with them. I assume you are using InDesign CS5?

     

    Anyhow, though, the first step to actually doing this is probably to read over Adobe's documentation on XML stuff. Just so you have a feel for this. Using InDesign CS5 / XML.

     

    Next, you should download and unzip the NLM Book DTD (Document Type Definition). This is a (machine-readable) specification for what kinds of XML tags are legal according to the NLM.

     

    Then, in InDesign, open up the Structure pane (Cmd-Opt-1 in CS5, or the funky "seatbelt" icon in the lower-left in CS3). From the flyout menu, choose Load DTD and select "book3.dtd."

     

    Now InDesign knows about all the tags that the NLM Book DTD defines. That means stuff like article-title, access-date, etc.

     

    Next, you need to associate all your content with the appropriate tags. You could do this by hand. That would be a huge amount of work and probably impractical.

    Next, you could do it by creating a mapping from paragraph and character styles to XML tags. For some kinds of documents, that is basically all you need to do. But for others, you might have to a lot of extra work. Either fixing up the tags by hand, or fixing up the styles by hand.


    Thirdly, you could change the way you import data into InDesign, up to and including using an XML-based input format. Then you start to get into the issues David highlighted. But maybe just being more rigorous in how styles are applied in Microsoft Word, etc. Maybe you're laying out files that customer is supplying to you? If so, you could require the customer to submit the files to you with an NLN Book DTD -compliant tagging?

     

    There are probably other solutions you could use, too...

     

    Hopefully this is the least bit helpful...

     
    |
    Mark as:
  • Currently Being Moderated
    Feb 28, 2011 7:07 AM   in reply to John Hawkinson

    I guess I did not explicitly state my underlying point that much of what used to be typesetting has become file conversion, instead distracting myself with the notion that these days the niches where one can apply traditional typographic skills seem to be decreasing.  The trajectory of InDesign's development and the postings on this forum both point to the growing importance of electronic publication.  I cited the recent Journal of Electronic Publishing piece in part because I found it a handy summary of some significant issues involved, but also because it links to a downloadable script for converting XHTML to IDML.  I think that John and I agree that anyone needing to end up with XML should consider introducing XML early on.

     

    Publishing continues its rapid rate of change, XML's role is only going to increase, and it would be wise to anticipate the consequences.

     

    David

     
    |
    Mark as:
  • Currently Being Moderated
    Mar 8, 2012 10:12 AM   in reply to backhand1

    Further to backhand's query, I am trying to make ready InDesign journal templates forXML file production based on the NLM DTD. I have loaded the book3.dtd (as suggested here) but the elements in their list bear no resemblence to the paragraph, character and object styles we use. Can anyone translate them for me? When they say 'container for' do they mean textboxes (object styles)? How do you distinguish betwen paragraph and character styles in their list? How do you tag several different paragraph styles when NLM offers only one, 'body'? I could go on... Thanks in advance

     
    |
    Mark as:
  • John Hawkinson
    5,512 posts
    Jun 25, 2009
    Currently Being Moderated
    Mar 10, 2012 7:36 PM   in reply to GrhmTFive

    FIve: As I tried to suggest to backhand1, you're probably in for a world of hurt.

     

     

    I have loaded the book3.dtd (as suggested here) but the elements in their list bear no resemblence to the paragraph, character and object styles we use. Can anyone translate them for me? When they say 'container for' do they mean textboxes (object styles)? How do you distinguish betwen paragraph and character styles in their list? How do you tag several different paragraph styles when NLM offers only one, 'body'?

    I'm not sure where you're seeing "container" elements in the NLM DTD. Also, "body" doesn't really seem to equate to paragraph styles. For instance, on the page for <body> frame, http://dtd.nlm.nih.gov/book/tag-library/n-pc30.html, here's the example they give:


    <book>
    <book-meta>...</book-meta>
    <book-front>...</book-front>
    <body>
    <book-part id="bid.2" book-part-type="chapter" book-part-number="1">
    <book-part-meta>...</book-part-meta>
    <body>
    <sec id="bid.3">
    <title>History</title>
    <p>Initially, GenBank was built and maintained at Los Alamos National 
    Laboratory (<xref ref-type="kwd" rid="bid.41">LANL</xref>). In the 
    early 1990s, this responsibility was awarded to NCBI through ...</p>
    </sec>
    <sec id="bid.4">
    <title>International Collaboration</title>
    <p>In the mid-1990s, the GenBank database became part of the International 
    Nucleotide Sequence Database Collaboration with the EMBL database ...</p>
    </sec>
    </body>
    <back>...</back>
    </book-part>
    </body>
    </book>
    

     

    The baseline paragraph tag is <p>.

    How are you interested in changing the paragraph style? As a function of the <sec> tag, perhaps? If so, you'll need to transform the XML such that what is tagged with <p> is instead tagged with somethign else, on the basis of ancestor elements. That's something you could do with XSLT.

     

    But please give a concrete example of what you're trying to do.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points