1 Reply Latest reply on Jul 28, 2016 1:05 PM by aliceell

    Special characters when exporting XML?


      I have what I thought was a simple workflow for updating an InDesign document.


      *Tag all the content in the InDesign document (not created by me).

      *Export it to XML.

      *Run a script to update the data in the various tags.

      *Re-import the XML with the updated data.


      But I'm running into a million problems. Most of them have to do with glyphs, formatting, and fonts. For instance, InDesign does not seem to export the XML in any consistent format.  When I leave the "Remap blank, whitespace, etc." option OFF, it exports everything as unicode (?): emdashes become —, line breaks become â€``. When I leave it on, the line breaks are blank! But the emdashes become remapped to —. (I don't think this is an issue with the XML program I'm using-- I've tried three different ones, and they all show the same thing.)


      Can I change these weird unicode(?) characters to XML entities prior to export? Or is there a way to easily track down what character corresponds to which XML entity, so I can import the XML correctly?


      BTW...how have people done this in the past? Do people usually create the XML from scratch rather than relying on Adobe's non-robust export/import process? I'm really fed up with this and would definitely appreciate any advice.


      Example: I'm trying to export this table to XML, but am having issues with formatting the "Property Damage Only" section.



      How it looks when the "Remap break, whitespace, etc." option is left on when exporting the XML:


                  <Table xmlns:aid="http://ns.adobe.com/AdobeInDesign/4.0/" aid:table="table" aid:trows="13" aid:tcols="9">


                      <Header aid:table="cell" aid:crows="1" aid:ccols="3">injury crashes</Header>

                      <Header aid:table="cell" aid:crows="2" aid:ccols="1" aid:ccolwidth="51.643412859366634">property damage   only</Header>



      With the "Remap break, whitespace, etc." option off:


                  <Table xmlns:aid="http://ns.adobe.com/AdobeInDesign/4.0/" aid:table="table" aid:trows="5" aid:tcols="5">


                      <Header aid:table="cell" aid:crows="1" aid:ccols="1" aid:ccolwidth="58.99999999999996">injury crashes</Header>

                      <Header aid:table="cell" aid:crows="1" aid:ccols="1" aid:ccolwidth="58.99999999999996">Property 


        • 1. Re: Special characters when exporting XML?
          aliceell Level 1

          Another example:


          Looking at the table in the image above and cross-referencing with this sheet, it seems like there is a space and a Forced Line Break glyph after each line in the 'Property Damage Only' column.


          Aha, I thought. I can replace the weird characters in my XML template with the proper XML entity for a forced line break. I thought it was &#10; , as was suggested by this document. Its unicode also matched the unicode that appeared in the Info panel in the above image: 0xA.



          <Header aid:table="cell" aid:crows="1" aid:ccols="1" aid:ccolwidth="58.99999999999996">Property 




          <Header aid:table="cell" aid:crows="1" aid:ccols="1" aid:ccolwidth="58.99999999999996">Property &#10;Damage &#10;Only</Header>


          When I imported it, though, the line-feed glyph suddenly became a paragraph glyph:


          The formatting is wrong, and even the unicode got changed. What gives?