0 Replies Latest reply on May 17, 2012 3:31 AM by djeyewater

    Regex with strings that contain non-latin chars

    djeyewater

      I am having difficulty with a regex when testing for words that contain non-latin characters (specifcally Japanese, I haven't tested other scripts).

       

      My code:

       

      keyword = StringUtil.trim(keyword);
      //if(keywords.indexOf(keyword) == -1)
      regex = new RegExp("\\b"+keyword+"\\s*;","i");
      if(!regex.test(keywords))
      {Alert.show('"'+keywords+'" does not contain "'+keyword+'"'); keywords += keyword + "; ";}
      

       

      Where keyword is

      日本国
      

      and keywords is

      Chion-in; 知恩院; Lily Pond; Bridge; 納骨堂; Nōkotsu-dō; Asia; Japan; 日本国; Nihon-koku; Kansai region; 関西地方; Kansai-chihō; Kyoto Prefecture; 京都府; Kyōto-fu; Kyoto; Higashiyama-ku; 東山区; Places; 
      

       

      When the function is run, it will alert that keywords does not contain keyword, even though it does:

      "Chion-in; 知恩院; Lily Pond; Bridge; 納骨堂; Nōkotsu-dō; Asia; Japan; 日本国; Nihon-koku; Kansai region; 関西地方; Kansai-chihō; Kyoto Prefecture; 京都府; Kyōto-fu; Kyoto; Higashiyama-ku; 東山区; Places; " does not contain "日本国"
      

       

      Previously I was using indexOf, which doesn't have this problem, but I can't use that since it doesn't match the whole word.

       

      Is this a problem with my regex, is there a modifier I need to add to enable unicode support or something?

       

      Thanks

       

      Dave