3 Replies Latest reply on Feb 23, 2012 12:50 PM by Dirk Becker

    UTF-8 string from PMString




      I am trying to convert a PMString object into a UTF-8 char* object for use with POSIX functions on a Mac. The PMString object will contain multibyte characters like Chinese, Japanese etc. I receive this PMString object from a ScriptData object which only gives a PMString or a WideString.


      Using PMString.GrabCString() causes the multibyte characters to appear as code values (<4E00>) in the char* strings. I've explored a couple of ways to convert a wide char* (wchar_t*) into a UTF-8 char* on a Mac. However, the basic problem seems to be that PMString internally stores the text in UTF-16 while wchar_t is 32 bit on Mac. As a result, when I call methods like GrabUTF16Buffer or GrabWString or even GetWChar_tString, I seem to be getting a corrupt wchar_t* string with UTF-16 characters stuffed into a 32 bit wide character array. I can't seem to form the same text back from the wchar_t* string using PMString(wchar_t*) or even explicit typecasting as PMString(UTF16Char*, numBytes).


      To summarize, starting with a PMString, how do I get/convert it's contents into a valid UTF-8 char* string, knowing it will contain multibyte characters? Thank you for your time.