3 Replies Latest reply on May 31, 2016 4:51 AM by KlausFriese

    String encoding for HTTP Post

    KlausFriese Level 1

      Hi,

       

      I'm working with HTTP get & post at the moment, I have a server with a apache based REST interface. Getting and posting data is no problem for 'normal' strings and characters, but I can't send characters as the german "Umlaut" ( Ä Ü Ö ) or japanese or chinese characters. Encoding for the Socket-connection is UTF-8 and the POST command also contains a Content-Type line with charset UTF-8. The server expects UTF-8 so I need my data in this encoding.

      When I send strings from InDesign now, I always get a 'Bad Request' error from my server. Looks like the strings in ESTK are not UTF-8 - how can I create UTF-8 strings? I don't find any encoding conversion methods..

       

      Thanks

      Klaus

        • 1. Re: String encoding for HTTP Post
          S Hopkins Adobe Community Professional

          You can get the Unicode number for your character by using InDesign's Glyph panel. Select your character and open the Glyphs panel (Type > Glyphs). Place your cursor over the character to reveal its information.

           

          Hope this is what you were looking for.

          • 2. Re: String encoding for HTTP Post
            Marc Autret Level 4

            Hi Klaus,

             

            The question is, what do you mean by "when I send strings from InDesign…"?

             

            There are two levels to consider:

             

            1. The InDesign DOM uses Text entities instead of straight 'strings' so you will have to convert first those texts into strings. Usually, myText.contents is ok, but special characters may alter the result in single-character texts. The usual trick is to use myText.texts[0].contents instead, special characters are automatically converted into actual string.

             

            It must be noticed, however, that InDesign character mapping is not fully Unicode compliant, details here Indiscripts :: InDesign CS4/CS5 Special Characters [Update]

             

            2. Now, in ExtendScript (JS) strings are UTF16. A single character ranges from U+0000 to U+FFFF (two bytes) and two characters (4 bytes) are needed to address code points U+10000 to U+10FFFF. Things do not work this way in UTF8, where U+0000…U+007F takes 1 byte, U+0080…U+07FF take 2 bytes, etc. Hence you need a UTF16-to-UTF8 converter in order to send your server what it expects. GitHub is your friend!

             

            @+

            Marc

            • 3. Re: String encoding for HTTP Post
              KlausFriese Level 1

              Ok, it took a while to find that - maybe someone else has the same problem and is looking for a solution, I will try to explain:

               

              I downloaded the tomcat source files to find out how tomcat is working with my data. I debugged deep into the code until I reached the CotoyeInputStream - here the data from InDesign is read. And a far as I understood this the Stream is read as bytes and the characters are counted. The Umlaute and other special characters are coded as two bytes, but they are counted as one character. For each transfered Umlaut the number of bytes is 1 higher than the number of characters.

              And at a point the stream is cut to the number of characters.

              So this are 13 bytes and 12 characters:

              {"name":"ü"}

              The byte stream is cut to 12 - last byte is lost and this will arrive in my server application:

              {"name":"ü"

              Closing bracket is missing and that throws the Exception.

               

              My simple solution/workaround: I'm adding multiple spaces. Tomcat is still cutting of the last characters, but now there are only spaces to cut off. And with the spaces it worked..