Skip navigation
arvid terzibaschian
Currently Being Moderated

Stripping all non-letters & non-numbers from a string (unicode)

Jan 11, 2011 11:48 AM

Hi there!

 

I am converting a bit of code from a java base and I am really stuck at a point where I need to strip all non-letters (unicode) from a string. In general I am searching for some kind of flex support to mimick some regexp expressions such as:

\p{L} or \p{Letter}: any kind of letter from any language (seehttp://www.regular-expressions.info/unicode.html)

and then use String.replace(/\p{L}/g,"") or something similar.

 

 

If there is no such regexp facility I could still try to loop through the string character by character and check its unicode property bits for a set "isLetter()", as java and several other languages provide it.

 

Just to make clear what I am searching for, I will give a short example:

If we take a unicode string containing "this is a unicode [@@Русский@@] multilangual string containing some cyrillic letters and //]] 1234 numbers"

it should strip out the @@ and [//, so basically everything BUT the letters.

 

If anyone knows a decent solution I would appreciate any help!

 
Replies
  • Currently Being Moderated
    Jan 11, 2011 1:58 PM   in reply to arvid terzibaschian

    The ActionScript RegExp class doesn't do what you want?

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 11, 2011 3:26 PM   in reply to arvid terzibaschian

    Did you try using \uXXXX format?

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 12, 2011 12:09 PM   in reply to arvid terzibaschian

    I don't know how hard that would be.  Wouldn't they tend to fall into

    ranges?  Feel free file an enhancement request for us to upgrade RegExp to

    do what you want.

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 12, 2011 9:58 PM   in reply to arvid terzibaschian

    Bugs.adobe.com/jira

     
    |
    Mark as:
  • Currently Being Moderated
    Jan 24, 2011 2:06 PM   in reply to arvid terzibaschian

    Please file a bug at bugs.adobe.com/jira

     
    |
    Mark as:
  • Currently Being Moderated
    May 2, 2012 4:56 AM   in reply to arvid terzibaschian

    I'm also trying to match all unicode letters and numbers.

    Have you figured out a solution?

     
    |
    Mark as:
  • Currently Being Moderated
    May 2, 2012 7:01 AM   in reply to arvid terzibaschian

    Thanks for the workaround, it works. Since I only have short strings it's ok to loop through them, but I can't imagine how long it would take on a long text.

    Here's my code if it's of any help to someone:

     

        private function cleanup(input:String):String{

             var output: String = "";

             var hexValue:uint;

             for (var k:int=0; k < input.length; k++) {

                  hexValue = input.charCodeAt(k);

                  if( (hexValue >= 0x00C0 && hexValue <= 0xD7FF)||

                      (hexValue >= 0xF900 && hexValue <= 0xFDCF)||

                      (hexValue >= 0xFDF0 && hexValue <= 0xFFEF)||

                      (hexValue >= 0x0041 && hexValue <= 0x005A)|| // A-Z

                      (hexValue >= 0x0061 && hexValue <= 0x007A)|| // a-z

                      (hexValue >= 0x0030 && hexValue <= 0x0039) // 0-9

                  ){

                     output = output + input.charAt(k);

                 }

              }

              return output;

         }

     

    I really wish I could do that with a regex...

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points