Skip navigation
inspiria_si
Currently Being Moderated

Regular Expressions in CS5.5 - something is wrong

Nov 3, 2011 3:45 AM

Tags: #regularexpression #regexp

Hello Everybody,

 

Please correct me, but I think, I found a serious problem with regular Expressions in Indesign CS5.5 (and possibly in other apps from CS5.5).

 

Let's start with simple example:

-------------

var range = "a-a,a,a-a,a";

var regEx = /(a+-a+|a+)(,(a+-a+|a+))*/;

alert( "Match:" +regEx.test(range)+"\nLeftContext: "+RegExp.leftContext+"\nRightContext: "+RegExp.rightContext );

-------------

What I expected was true match and the left  and the right context should be empty. In Indesign CS3 that is correct BUT NOT in CS5.5.

 

In CS 5.5 it seems that the only first "a-a" is matched and the rest is return as the rightContext - looks like big change (if not parsing error in RegExp engine).

 

Please correct me if I am wrong.

 

The second example - how to freeze ID CS5.5:

-------------

var range = "a-a,a,a-a,a";

var regEx = /(a+-a+|a+)(,(a+-a+|a+)){8,}/;

alert( "Match:" +regEx.test(range)+"\nLeftContext: "+RegExp.leftContext+"\nRightContext: "+RegExp.rightContext );

--------------

As you can see it differs only with the {8,} part instead of *

Run it in CS5.5 and you will see that the ID hangs (in CS3 of course it runs flawlessly}.

 

The third example - how to freeze ID 5.5 in one line (I posted it earlier in Photoshop forum because similiar problem was called earlier):

 

---------------

alert((/(n|s)? /gmi).test('s') );

---------------

As you can guess - it freezes the CS5.5 (CS3 passes the test).

 

 

Please correct me if I am doing something wrong or it's the problem of Adobe.

 

Best regards,

Daniel Brylak

 
Replies
  • Currently Being Moderated
    Nov 3, 2011 8:10 AM   in reply to inspiria_si

    Hi Daniel,

     

    Thanks for sharing. Really annoying indeed.

    Just to complete your diagnosis, what you describe about CS.5 is the same in CS5, while CS4 behaves as CS3.

     

     

    var range = "aaaaa";
    var regEx = /(a+-a+|a+)(,(a+-a+|a+))*/;
     
    alert([
        "Match:" +regEx.test(range),
        "LeftContext: "+RegExp.leftContext.toSource(),        // => CS3/4: EMPTY -- CS5+: EMPTY
        "RightContext: "+RegExp.rightContext.toSource()        // => CS3/4: EMPTY -- CS5+: ",a,a-a,a"
        ].join('\r'));
    

     

    So there is a serious implementation problem of the RegExp object from ExtendScript CS5.

     

    I don't think it's related to the greedy modes. By default, JS RegExp quantifiers are greedy, and /a*/ still entirely captures "aaaaaa" in CS5+.

    By the way, you can make any quantifier non-greedy by adding ? after the quantifier, e.g.: /a*?/, /a+?/, etc.

     

    I guess that Adobe ExtendScript has a generic issue in updating the RegExp.lastIndex property in certain contexts—see http://forums.adobe.com/message/3719879#3719879 —which could explain several bugs such as the Negative Class bug —see http://forums.adobe.com/message/3510078#3510078 — or the problems you are mentioning today.

     

    @+

    Marc

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 3, 2011 10:05 PM   in reply to inspiria_si

    I cannot be sure in whether it relates to this problem.

    Please perform this sample code.

    var str = "2011-11-03";
    //var str = "11-03";
    //var str = "03";
    var regex = /^((\d+)-)?((\d+)-)?(\d+)$/;
    
    $.writeln("Match : " + regex.test(str));
    $.writeln("RegExp.$1 : " + RegExp.$1);
    $.writeln("RegExp.$2 : " + RegExp.$2);
    $.writeln("RegExp.$3 : " + RegExp.$3 );
    $.writeln("RegExp.$4 : " + RegExp.$4 );
    $.writeln("RegExp.$5 : " + RegExp.$5 );
    

    I get:

    Match : true
    RegExp.$1 : 2011-
    RegExp.$2 : 2011
    RegExp.$3 : 11-
    RegExp.$4 : 11
    RegExp.$5 : 03
    

    It's right!

    But, If str = "11-03" then

    Match : true
    RegExp.$1 : 11-
    RegExp.$2 : 11
    RegExp.$3 : 
    RegExp.$4 : 03
    RegExp.$5 : 03
    

    RegExp.$4 captured "03" why?

    and, If str = "03" then

    Match : true
    RegExp.$1 : 
    RegExp.$2 : 03
    RegExp.$3 : 
    RegExp.$4 : 03
    RegExp.$5 : 03
    

    RegExp.$2 captured "03"

     

    I reported to Adobe site.

     
    |
    Mark as:
  • Currently Being Moderated
    Nov 4, 2011 1:50 PM   in reply to seuzo

    Daniel, Seuzo,

     

    Just posted a basic ScriptUI dialog to make regex-testing easier:

    http://www.indiscripts.com/post/2011/11/extendscript-regexp-tester

     

    @+

    Marc

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (2)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points