Some more info on this:
I've found this article which suggests it could be related to copy/paste in InDesign:
I haven't been able to replicate the problem with copy/paste though.
Also, this article, which suggests there is an import issue with IDML:
We don't import IDML though, only export it. The 2nd article is different in that the <XMLElement> is inside the <ParagraphStyleRange> this time and wrapped around one of the <CharacterStyleRange> tags.
I'm wondering if anyone knows how/why that <XMLElement> might be showing up and what are some causes? I'm deserializing the IDML (server-side) into a class object, and that will be extremely difficult if <XMLElement> can just show up anywhere in the XML of a Story, etc.
1 person found this helpful
This is totally normal and expected. The XMLElement elements show the xml structure of the document and can appear at several different levels in the Story_.xml files. It shouldn't be that hard to remove them (and other xml-related elements) via a preprocessing transform, but the easiest way would probably be to untag everything in the document before you export it to idml.
ought to do the trick, provided none of the affected page items or layers are locked.
Thanks for the reply!
If I were to use the code snippet you provided, would simply remove the <XMLElement> tags? or what does untag() exactly do?
1 person found this helpful
Page items can be associated with an element in the xml structure of the document by tagging them in the UI or calling markup() on them via script. Calling untag() on an xmlElement will remove that association and remove that part of the xml structure (and all of its children) but should not otherwise affect the page items.
So, referring to this thread: http://forums.adobe.com/thread/656468
The <XMLElement> tag is wrapped around the <CharacterStyleRange> tag. If I were to call untag() before epxorting the IDML, does that change the document?
My concern is that I will potentially be parsing the IDML and writing a script based on (or referring to) a paragraph/character style that could have been once nested in an <XMLElement>. Since our export script doesn't save the document when exporting, the document will still technically be "tagged". When the script is run on a document that is still "tagged" would that cause problems?
Hopefully this makes sense and thanks again!
We are talking about two different xml structures: the xml structure in the InDesign document (what you can see by going to View=>Structure=>Show Structure), and the structure of the idml documents. Removing a document xmlElement will remove its children as well, but, while it will affect the structure of your exported idml somewhat, it won't remove all children of the XMLElement elements in the idml. The two things don't map in that fashion. I hope I'm making sense.
You shouldn't have a problem referring to styles or any other piece of the document. The only way (that comes to mind) that an untagged document will differ from its tagged ancestor is in the character counts of the stories, because the 0xFEFF characters that hold the tags will be gone.
While I don't have a full understanding of your workflow, I don't foresee a problem with removing the document xml structure if you're not using it.
That makes sense. I just want to make sure that if I export IDML using the code snippet you sent, which, as I've tested, removes the <XMLElement> tags from the IDML while keeping it's children tags, still creates the same document if I were to open up that new IDML and save as .indd.
And by "same document" I just mean that if a <characterStyleRange> tag were nested in an <XMLElement> before the export, the <characterStyleRange> would still exist after calling untag().
Thanks so much for the help!!
Not to be obtuse, but why don't you fix your XML parser? If you're using XPath, then instead of looking for "ParagraphStyleRange" you can just look for "//ParagraphStyleRange."
Thanks for the idea, but we aren't using an XML parser or XPATH. Currently, we are deserializing the XML into a class object using an XMLSerializer. I could define the class differently, but since we aren't (and won't be) using the <XMLElement> anyway, I don't see the point.
I could define the class differently, but since we aren't (and won't be) using the <XMLElement> anyway, I don't see the point.
The point is you're making assumptions about how InDesign formats the IDML file, and those assumptions are inconsistent with the specification/reality. InDesign might insert other tags besides XMLElement, or otherwise move around some tags, perhaps in a minor revision or if you make other kinds of changes to the INDD file.
By definining your parser in a rigid fashion, you both violate the specification and leave yourself open to all kinds of hurt and instability in the future if something changes.
I would say it is better to do it properly.
Well, that may be partially true. Howeever, I don't think I would say we are defining our parser in a "rigid" fashion - we're keeping it as simple as we need. We don't need to include other elements or tags/values if we don't use them. The process this is used for is fairly simple. But thanks for the warning.
Sure, but to put this in English, you are looking for "a ParagraphStyleRange tag that is a direct child of a Story tag." That is wrong. You should be looking for "a ParagrapHStyleRange that is a child or a grandchild or an n-th-grand-child of a Story tag."
You never know what ID might decide to put in there...
That makes sense.
I guess I was just under the impression that using untag(), thus eliminating the <XMLElement> tags, I wouldn't have to look for " ParagrapHStyleRange that is a child or a grandchild or an n-th-grand-child of a Story tag." because the ParagraphStyleRange tags would always be under the Story tags then.
Thats really what I want to avoid, because it will make the class object that we deserialize the XML to way more complicated than it needs to be - especially if we aren't using the <XMLElement> tags.
It may be going against the exact IDML specification but we really only need a few of the font/style/size values out of the ParagraphStyleRange and CharacterStyleRange elements so if I can strip out unneeded elements/tags (such as XMLElement) to make it easier to obtain what I do need.
Hopefully all of that makes sense.
Do you see any issues with this?
I don't see any specific issues. I would just worry that it might break at some future time.
I'm not familiar with XMLSerializer so I don't know how easy it is to do this "right." I would suspect it's not too tough, though glancing at http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx doesn't give me any obvious ideas, so *shrug*.
You can always fix it if it breaks.