3 Replies Latest reply on Nov 26, 2007 5:17 AM by Tal_CS

    XMLParser & unicode characters

    Tal_CS
      Hello,

      I'm struggling with this problem: parsing an imported xml string that contains characters in a different language than English (Hebrew for example. See example of xml below).

      I tried putting the string in the message area, and the utf encoding causes the hebrew characters to really look bad (something like "^a©~aa|").
      When I try to write the Hebrew string to the screen it writes question marks ("?????") instead of the text.

      I found a way to get Hebrew text into director, which is by retrieving one string at a time, without parsing it through xml, but then, it's not a good solution, since I could have many <elements> in the xml.

      Does anyone knows how to parse this kind of xml file correctly? am I doing something wrong here?
      the xml is stated with utf-8 and is saved with encoding UTF-8 (using Visual Studio 2003), with or without signature.

      Desperatly yours,
      Tal

        • 1. Re: XMLParser &amp; unicode characters
          Tal_CS Level 1
          Solution found (at last!):
          If you want to include texts in non-western language, don't stray too much with utf-8 encoding like I did. Just define ISO-8859-1 at the first line of the xml docuement ("<?xml encoding="ISO-8859-1" ?>") and save the document in the default encoding (ANSI). Oddly, it does the trick for other languages too. Strange5050 was right after all (see livedocs).
          • 2. Re: XMLParser &amp; unicode characters
            the real POTMO Level 1
            Director does not support UTF8 in the current version (MX2004 10.1.something)
            In the next version (Dir 11) Adobe has promised that UTF8 will be fully supported.

            There is still light in the end of the tunnel. The XML-parser supports UTF8 so you can read the UTF8 files but you cannot handle the strings after parsing.
            After parsing director can only hande ISO-8859-1 (That is ASCII or as it is called in fonts the Western Latin)

            Im not into hebrew that much but i guess they havent got more than 128 characters.

            But this is the way to go if you cant wait for Dir11. I have been struggeling with this for a long time when i was making a multilanguage platform (Russian, Greek, Check, and so on)

            First make a new ISO-8859-1 font with "ו" replaced with "A" and so on.
            Now you have a conversionmap.
            Search and replace all your hebrew xml externally in some software that
            supports UTF8.
            Replace all "ה" with "b" and so on.

            Import your "abc-hebrew" into director and show it with your ISO-8859-1 font.
            TADA!!
            Done!

            It's a bit of work but thats the easiest way to do it.
            Remeber to save your "hebrew"-xml in ISO-8859-1 and not UTF8
            • 3. Re: XMLParser &amp; unicode characters
              Tal_CS Level 1
              Thank you the real POTMO. Apparently, we wrote a reply at the same time, so your answer is also the answer. You're right about the ISO-8859-1 (Latin 1) encoding & parsing, but I think I managed to import the texts without doing conversions. It just .... worked :)