6 Replies Latest reply on Mar 26, 2010 3:34 AM by Ian Proudfoot

    Extract the xml with greek letters as entity

    007Raja Level 1

      Hi All,

       

      I need to extract the XML from Indesign CS3. But the greek letters out comes are ascii value. I need this as an entity.

      can anyone please advice how to proceed?

       

      Thanks

      Rajasekar

        • 1. Re: Extract the xml with greek letters as entity
          [Jongware] Most Valuable Participant

          A little clarification might help.

           

          Do you mean, you see an "α" in your document but in the XML you get an "a"? In that case, the XML export is working fine and it's your document that is "wrong". You have a font override on the "a", and that font doesn't have a real alpha in the correct Unicode position but on the place of an "a" instead. Switch to a Unicode font for the Greek symbols, so you have to use the Greek Unicode characters where you want to get Greek.

          • 2. Re: Extract the xml with greek letters as entity
            007Raja Level 1

            Thanks for your reply,

             

            i will check and get back to you.

             

             

            Rajasekar

            • 3. Re: Extract the xml with greek letters as entity
              007Raja Level 1

              hi

               

              I used the unicode entity in xml "α" and got the required output in indesign as "α".

              When i extract the xml from indesign CS3, i need the unicode "α" instead of "α"

               

              How to get it, i need to write any script for this?

               

               

              can you please guide me.

               

               

               

              Thanks in advance

              Rajasekar

              • 4. Re: Extract the xml with greek letters as entity
                Ian Proudfoot Level 3

                Rajasekar,

                 

                Perhaps this is a job for post-processing the XML with XSLT. Although it would be tedious job with XSLT 1.0, I believe that this sort of problem is best handled in the XML environment.

                 

                 

                It's also worth considering whether you really need to do this at all... I have worked on several XML projects where the initial requirement was to output XML that included entity references or numeric character references. In all but two cases it was found that this was not necessary as ultimately the XML has the same meaning with or without the use of entities. In the two cases where the conversion was required it was due to poor Unicode handling by downstream software.

                 

                Regards

                Ian

                • 5. Re: Extract the xml with greek letters as entity
                  [Jongware] Most Valuable Participant
                  .. ultimately the XML has the same meaning with or without the use of entities ..

                   

                  Yes, that's what I immediately thought. Thanks for confirming!

                   

                  Raj, the "entity" α is, for all intents and purposes, the same as a literal alpha. The fact that it has been converted to an ASCII-only notation is for storage purpose only. All not completely brain-dead XML processing software should be aware from this, and if it is not, it's not really compliant to the XML standards.

                   

                  Ian:

                  Perhaps this is a job for post-processing the XML with XSLT. ...

                   

                  I don't think so. If the XSLT processor is "not entirely brain-dead" (in the sense defined above ), it will never ever see the sequence of characters '&', '#', 'x', and so on. It will see an alpha, wonder what all the fuss is about, then output a new XML file -- containing "α" if it's writing to an ASCII-only format, or a literal alpha when its writing to 16-bit Unicode, or UTF-8 encoding when asked.

                  • 6. Re: Extract the xml with greek letters as entity
                    Ian Proudfoot Level 3

                    Jongware:

                    My suggestion was based on the implication that entities had to be produced when they are not really needed. So if any XML processor 'knows' it is working with utf-8, for example, it would output literal text. The XSLT would then have to transform the literals into entities... A total waste of effort, but unfortunately a 'common' request.

                     

                    Ian