13 Replies Latest reply on Mar 13, 2014 10:07 AM by JEngland

    Im/Export XML Error Break, White Space, Special Characters

    JEngland

      Hi Everyone,

       

      I'm running into an issue when conducting an XML Merge.

       

      The situation:

      - Document has been structured using tags so that it could be exported to XML

      - The XML document was then translated by professional translators into Russian from English

      - All XML tags in English/Russian version are the same, no XML has changed

      - XML Merge is conducted and all content is placed into the the InDesign document and the English/Russian content both has the proper tags selected

      - Russian content however, does not include any special characters such as line breaks, or spaces. This is causing Paragraph Styles to be wrong, and essentially the whole document is formatted incorrectly 

       

      The Question:

      - How can you include special/hidden characters such as line breaks into the original XML Export? I keep getting the error "Content contains characters that can not be encoded"

      - If line breaks are included and are inserted into XML, when I conduct the XML Merge will InDesign recognize these codes/characters that are between the XML Tags?

       

      Thanks for your help,

       

      Jon

        • 1. Re: Im/Export XML Error Break, White Space, Special Characters
          MW Design Level 4

          Jon,

           

          A snippet of the code might be helpful.

           

          In general, the XML should be formatted how you desire to import it, line breaks included. An XML file is simply a text file.

           

          Mike

          • 2. Re: Im/Export XML Error Break, White Space, Special Characters
            JEngland Level 1

            Hi Mike,

             

            Thanks for the response

             

            XML of both English and Russian versions are in my drop box here: https://db.tt/mqlfkthN

            EDIT: Also added shortened versions of the actual InDesign files in EN and RU  so you can see the difference in how XML imports in both

             

            The English version was created in InDesign, which explains why the original source document has proper line breaks etc.

             

            Are you suggesting that the formatting already put in the document does not transfer into the XML and that we need to modify the XML code separately after to ensure a smooth import? If so, what type of code or symbol can I put between XML tags to indicate space or break?

             

            Thanks,

             

            Jon

            • 3. Re: Im/Export XML Error Break, White Space, Special Characters
              MW Design Level 4

              Hi Jon,

               

              Unfortunately, I don't use CC so I cannot open the ID files from CC unless they are exported as IDML.

               

              No matter the source of the XML, I rarely receive it in a form that ID likes. Without seeing what you have in ID as regards formatting, I would generally reformat the XML using a text editor (UltraEdit or NotePad++) something like this:

               

              <Root>

              <Story>

              <Arial10>13/02/2014
</Arial10>

              <Arial10>Страница </Arial10>

              </Story>

              <Story>

              <ArialBold12>4 Эксплуатация машины</ArialBold12>

              </Story>

              <Story>

              <ArialBold12Right>Эксплуатация машины 4</ArialBold12Right>

              </Story>

               

              However, depending upon the layout, some nodes may need to be moved adjacent to others, etc. In those cases, I use an XML editor and run XSLT files on them that I edit to suit.

               

              The XML, while valid XML, is arranged in a way I likely wouldn't, but it obviously works for your purposes. Without being able to open the ID files, I have no sure way of knowing what nodes, if any, would be moved, etc.

               

              Take care, Mike

              • 4. Re: Im/Export XML Error Break, White Space, Special Characters
                JEngland Level 1

                Which version do you use? If I uploaded correct version, would you take a quick look at the layout, before sending me off in the right direction to research?

                • 5. Re: Im/Export XML Error Break, White Space, Special Characters
                  MW Design Level 4

                  Hi Jon,

                   

                  CS6 for the most part. Unless it was changed in CC, there should be an export option for IDML. I would be happy to take a quick look before dashing out for part of the day.

                   

                  Mike

                  • 6. Re: Im/Export XML Error Break, White Space, Special Characters
                    JEngland Level 1

                    Hi Mike,

                     

                    Uploaded IDML files https://db.tt/mqlfkthN

                     

                    The easiest way to see the issue is comparing Page 4-2 English and Russian Versian and looking near the top of the page to the big blob of text. (there's only three pages total)

                     

                    One last note: the XML was created from InDesign, we did not write it ourselves. You would think letting them generate the XML would lead to smooth importing later on.

                     

                    Thanks!

                     

                    Jon

                    • 7. Re: Im/Export XML Error Break, White Space, Special Characters
                      MW Design Level 4

                      Heh, heh. Don't ya just hate it when things go wrong?

                       

                      capture-001235.png

                      • 8. Re: Im/Export XML Error Break, White Space, Special Characters
                        MW Design Level 4

                        Oops. Gotta be smarter when choosing which file to open. Be back in a minute or 10...

                        • 9. Re: Im/Export XML Error Break, White Space, Special Characters
                          MW Design Level 4

                          Hi Jon, I have to dash out for a while.

                           

                          One thing that appears to be happening is the tags are not being mapped properly to the styles. the other are the line breaks. For both these issues I have to look a bit deeper into the causes. Which will be later today.

                           

                          I am asusming that the problem manifests itself best at the top of this page, which here now looks to be corrected...would that be right?

                           

                          capture-001236.png

                          • 10. Re: Im/Export XML Error Break, White Space, Special Characters
                            JEngland Level 1

                            Yes, that is how it is supposed to look.

                             

                            As you noticed, the line breaks in the Russian document are not consistent with the English document. When you added them in, somehow, the tags were matched with the appropriate Paragraph Style.

                             

                            It's very interesting how that works.. I'm hoping to find a solution that I can somewhat automate because the whole manual is fairly long

                             

                            Thanks for any time you can put towards this

                            • 11. Re: Im/Export XML Error Break, White Space, Special Characters
                              MW Design Level 4

                              Hello Jon,

                               

                              OK. I sat down this morning and looked at what you started with (the EN version). Unfortunately, ID's XML export really isn't made for round-tripping, which is what I think is going on, correct?

                               

                              I don't do translation work like a couple others here do. A good search here would probably be instructive as I don't know the tools they use in conjunction with ID.

                               

                              I do know that for round-tripping, exporting as ID tagged text works a whole lot better. So if I click into your text frame and export as tagged text, change some relevant text from the English to Russian, then place the tagged text file, the file is formatted properly.

                               

                              The screen shot is from a fresh ID file in which I have placed my edited tagged text file.

                               

                              capture-001237.png

                               

                              A Tagged Text file is just that, it is a text file any text editor that can open and edit UTF-16 text files. I suspect people who do translation work use different software. I have used tagged text that comes out of a database (MS Access) and it works (nearly) perfect everytime.

                               

                              Mike

                              1 person found this helpful
                              • 12. Re: Im/Export XML Error Break, White Space, Special Characters
                                Joel Cherney Adobe Community Professional & MVP

                                I do know that for round-tripping, exporting as ID tagged text works a whole lot better. So if I click into your text frame and export as tagged text, change some relevant text from the English to Russian, then place the tagged text file, the file is formatted properly.

                                It certainly works better than it does on InDesign-exported XML! Most of the translation environment tools work best on IDML - there are IDML-specific filters for all of the leading industry toolsets - but this export-translate-and-reimport workflow is clumsy when you're relying on XML export from ID. I mean, that may be valid XML you're exporting, but it looks like the tags are specifying formats, not content types, which says to me that the person doing the tagging in ID doesn't really get what XML export is for. When you're exporting IDML you're getting a much better version of that, to be honest.

                                 

                                All in all, Mike is 100% correct here - if you need to roundtrip (export, re-jigger content, re-import) then you'd be better off with Tagged Text. I kind of wonder what the Russian translation provider asked for.

                                • 13. Re: Im/Export XML Error Break, White Space, Special Characters
                                  JEngland Level 1

                                  Very helpful Mike, thank you!

                                   

                                  Joel: We are the provider of translations. In this situation the relatively new client supplied XML only, after digging into what they were using the XML for they supplied an .indd file. Apparently, they would just follow this process in the past, they say usually it doesn't turn out this messy, and they go through manually to proof/edit each page. I just thought there must be a better way to roundtrip, which now I know there is. This should make their lives drastically easier