4 Replies Latest reply on Nov 23, 2014 6:12 PM by Marc Autret

    Some regexp inside ExtendScript freeze InDesign

    eutheneia

      Hello everybody!

       

      I am verifying that some particular regexp freezes InDesign when executed inside ExtenScript.

       

      For example:

       

           if (m = line.match(/^sort\s+((?:a(?:lphabetically)?|n(?:umerically)?)|r(?:everse)?){1,2}$/i)) { ...


      (meaning that my parser accept lines in one of these forms:


           sort alphabetically reverse

           sort numerically

           sort ar

           sort reverse numerically


      and so on...).

       

      Another example.

      If I use this, everything is working:

       

           m = line.match(/corner\s+((?:\+|\-)?\d+)(?:\s|$)/);


      but, if I add the bolded part \d*\.?:

       

           m = line.match(/corner\s+((?:\+|\-)?\d*\.?\d+)(?:\s|$)/);

       

      when executing, InDesign freezes.

       

       

      What do you think? Maybe ExtendScript contains some bug on the regexp engine? Am I wrong somewhere and I am not able to understand where? :-(

       

      Do you experiment the same failure than me when you execute all that?

      I am working on Indesign CS6, CC, CC 2014.

       

       

      Many thanks ...

       

       

      Roberto


        • 1. Re: Some regexp inside ExtendScript freeze InDesign
          pixxxel schubser Level 5

          Hi eutheneia,

           

          what is this?

          ^sort\s+((?:a(?:lphabetically)?|n(?:umerically)?)|r(?:everse)?){1,2}


          IMHO this should be better:

          sort\s+(((alphabet|numer)ically|reverse)\s)*


          And please – show us some examples what do you hope to find with your second grep.

          • 2. Re: Re: Some regexp inside ExtendScript freeze InDesign
            eutheneia Level 1

            /^sort\s+((?:a(?:lphabetically)?|n(?:umerically)?)|r(?:everse)?){1,2}$/


            can match each one of these lines:

             

            alphabetically

            numerically

            alphabeticallyreverse

            numericallyreverse

            reversealphabetically

            reversenumerically


            but also their abbreviations:


            a

            n

            ar

            nr

            ra

            rn


            Instead,

            /corner\s+((?:\+|\-)?\d*\.?\d+)(?:\s|$)/


            matches "corner " followed by a number that can be integer or floating, preceded by its optional sign and followed by a space or a linebreak.




            But I were asking something different: simply why it freezes InDesign, and if always (on every machine) it is not working, or if I have to suspect that my installation is bad.








            • 3. Re: Re: Re: Some regexp inside ExtendScript freeze InDesign
              pixxxel schubser Level 5

              IMHO you use a wrong syntax.

              This regex do not works in InDesign.

               

              Please try something like this:

              corner\s(?=[+-]?\d+(\.\d+)?(\s|$)?)

              to find e.g.

              corner 1

              or

              corner 12

              or

              corner 123.45

               

               

              Have fun

              ;)

              • 4. Re: Re: Re: Some regexp inside ExtendScript freeze InDesign
                Marc Autret Level 4

                Hi eutheneia,

                But I were asking something different: simply why it freezes InDesign, and if always (on every machine) it is not working, or if I have to suspect that my installation is bad.

                 

                Greedy quantifiers like +, *, or {m,n} cause issues in InDesign—in particular, in CS6 and later—when mixed with optional sub-patterns. For instance, a very simple way to create an infinite loop in CS6-CC is:

                 

                alert( /(aA?|bB?)+$/.test("bx") ); // CS4: FALSE ; CS6-CC: FREEEEZE!
                

                 

                or even:

                 

                alert( /(a|bB?)+$/.test("bx") );   // CS4: FALSE ; CS6-CC: FREEEEZE!
                

                 

                Those "explosive quantifiers" are not properly addressed due to various backtracking bugs which Adobe devs don't seem to be aware of. Another fact is that ExtendScript CS6-CC doesn't interpret the scheme /((A|B)|C)/ the right way:

                 

                alert( /^((a|b)|c)+$/.test("ac") ); // CC: FALSE!
                alert( /^((a|b)|c)+$/.test("ca") ); // CC: TRUE
                
                // Interestingly:
                
                
                alert( /^(c|(a|b))+$/.test("ac") ); // CC: TRUE
                alert( /^(c|(a|b))+$/.test("ca") ); // CC: TRUE
                
                

                 

                But, in fact, /((A|B)|C)/ is equivalent to /(A|B|C)/ —isn't it? So your original pattern is (although without your consent!) very similar to:

                 

                /^sort\s+(a(?:lphabetically)?|n(?:umerically)?|r(?:everse)?){1,2}$/i
                

                 

                which does not solve the problem, but makes things easier to follow.

                 

                [In addition, the repeated capturing group (...){1,2} will only capture the last iteration—which is probably not what you expect.]

                 

                In your case, the best way to avoid explosive quantifiers (and those good old InDesign bugs!) is to “flatten” the alternatives:

                 

                (a|alphabetically|n|numerically|r|reverse){1,2}
                

                 

                At this point freezing and bad results are gone, but your pattern still doesn't address the original goal.

                 

                For all these reasons, and probably a few more, I think you should change your approach. Regex are based on (in)finite state automata and should be used for clean pattern matching, not for if-then-else parsing ;-)

                IMHO you should use rules, progressive steps, and divide your syntactic problem into smaller pieces.

                 

                @+

                Marc