2 Replies Latest reply on Mar 5, 2011 8:51 AM by Marc Autret

    [JS CS4] Strange RegExp behaviour

    Marc Autret Level 4

      Hi all,

       

      I'm trying to understand the behaviour of the String.search() method running on a very simple pattern :

       

      The code :

       

      alert( "abc".search( /[^a-f0-9]/ ) );

       

      displays 3 (!) while the code :

       

      alert( "abc".search( /[^abcdef0123456789]/ ) );

       

      displays -1 (as expected).

       

      I thought that the complementary class [^a-f0-9] was strictly equivalent to [^abcdef0123456789] (?)

       

      Anyway, the fact that the first search() returns an out-of-range index (3) seems weird.

       

      Did I miss something?

       

      (NB: tested on ID CS4 Win.)

        • 1. Re: [JS CS4] Strange RegExp behaviour
          Dave Saunders Level 4

          I get the same result on my Mac with ESTK CS4 targeted.

           

          As you say, weird.

           

          Dave

          • 2. Re: [JS CS4] Strange RegExp behaviour
            Marc Autret Level 4

            For your information:

             

            The bug I mentioned above has been fixed in CS5:

             

            alert( "abc".search( /[^a-f0-9]/ ) );     // CS4 => 3 -- CS5 => -1
            

             

            Making further tests with IndexMatic I found that the CS4 issue generically appears under the following conditions:

             

            1) the pattern uses a Negated Character Class [^...]

            2) the Negated Character Class is based on several ranges or entities, such as [^a-z0-9_].

             

            Description of the bug: the CS4 regex engine behaves as if the parsed string had a final void character, and that ghost character is regarded as matching the Negated Character Class. For example we observe the following result:

             

            alert( /foo[^a-z0-9]/.test("foo") ); // CS4 => true (!) -- CS5 => false (ok)
            

             

            while the expected behavior is returned in these additional tests:

             

            alert( /foo[^a-z0-9]/.test("foo@") ); // CS4 & CS5 => true (ok)
            alert( /foo[^a-z0-9]/.test("foo5") ); // CS4 & CS5 => false (ok)
            

             

            Wordkaround: rather than using a Negated Character Class, it is usually possible to use a Negative Lookahead on the complementary class when we needn't to retrieve the matches. In this specific case [^a-z0-9] is quite similar to (?![a-z0-9]), which works as expected:

             

            alert( /foo(?![a-z0-9])/.test("foo5") ); // CS4 & CS5 => false (ok)
            alert( /foo(?![a-z0-9])/.test("foo@") ); // CS4 & CS5 => true (ok)
            alert( /foo(?![a-z0-9])/.test("foo") ); // CS4 & CS5 => true (ok)
            

             

            The last line shows that the negative lookahead also succeeds where the string ends —without consuming any character. This is the main difference with a negated character class. So we can improve our workaround by adding a dot after the lookahead, in order to 'consume' the required character. Hence here is the final solution I suggest:

             

             

            alert( /foo(?![a-z0-9])./.test("foo5") ); // CS4 & CS5 => false (ok)
            alert( /foo(?![a-z0-9])./.test("foo@") ); // CS4 & CS5 => true (ok)
            alert( /foo(?![a-z0-9])./.test("foo") ); // CS4 & CS5 => false (ok)
            

             

            And:

             

            alert( "foo5".match(/foo(?![a-z0-9])./) ); // => NULL
            alert( "foo@".match(/foo(?![a-z0-9])./) ); // => foo@
            alert( "foo".match(/foo(?![a-z0-9])./) );  // => NULL
            

             

            @+

            Marc