3 Replies Latest reply: Feb 23, 2012 12:50 PM by Dirk Becker RSS

    UTF-8 string from PMString

    sameervijaykar

      Hi,

       

      I am trying to convert a PMString object into a UTF-8 char* object for use with POSIX functions on a Mac. The PMString object will contain multibyte characters like Chinese, Japanese etc. I receive this PMString object from a ScriptData object which only gives a PMString or a WideString.

       

      Using PMString.GrabCString() causes the multibyte characters to appear as code values (<4E00>) in the char* strings. I've explored a couple of ways to convert a wide char* (wchar_t*) into a UTF-8 char* on a Mac. However, the basic problem seems to be that PMString internally stores the text in UTF-16 while wchar_t is 32 bit on Mac. As a result, when I call methods like GrabUTF16Buffer or GrabWString or even GetWChar_tString, I seem to be getting a corrupt wchar_t* string with UTF-16 characters stuffed into a 32 bit wide character array. I can't seem to form the same text back from the wchar_t* string using PMString(wchar_t*) or even explicit typecasting as PMString(UTF16Char*, numBytes).

       

      To summarize, starting with a PMString, how do I get/convert it's contents into a valid UTF-8 char* string, knowing it will contain multibyte characters? Thank you for your time.