2 Replies Latest reply: Oct 23, 2009 4:12 PM by Laubender RSS

    match(/\w/g) does NOT match findGrep() with  .findWhat = "\\w"

    Laubender CommunityMVP

      Hello, everybody!

       

      As you might know \w is not \w when it comes to GREP search by different means. I learned the hard way that I cannot be sure when using match() or findGrep() that I'm getting identical results. The difference in technique seems clear to me: match() uses a search in memory independing of the InDesign-specific GREP search.

       

      Please correct me if I'm wrong, but shouldn't both methods yield the same results using the same GREP pattern? Using the dot as pattern does.

       

      Is it because of a "bug" in ExtendScript?

       

      To illustrate the problem I wrote a script that opens a new document with a text frame labeled "ToSearch" with a lot of characters from the German keybord. Then that will be searched by findGrep() and match() alike. Results are written to new text frames on the page labeled accordingly.

       

      The script searches for the following patterns:
      \w
      [0-9A-Za-zÀ-ü]

       

      In the case of [0-9A-Za-zÀ-ü] there is no difference in "scope" of the GREP pattern.
      \w is a different beast. German Umlauts are not to be found, etc.etc.

       

      But see for yourself. I tested this script with InDesignCS3 ESTK 2 and InDesign CS4 ESTK. There are inconsistent results from CS3 to CS4. CS4 shows more find results than CS3. E.g. "ß"  was not found by CS3 when using findGrep().

       

      Is there any documentation how Adobe implemented core JavaScript GREP to ExtendScript so I can be sure of the specific "scope" of a GREP wildcard?

       

      Hope you can help,

      Uwe Laubender

       

      Attached Script: FindGrep_vs_match.jsx

        • 1. Re: match(/\w/g) does NOT match findGrep() with  .findWhat = "\\w"
          [Jongware] CommunityMVP

          I think the difference you see is not one within InDesign itself, but rather the difference between JavaScript built-in GREP (which is not written by Adobe) and the InDesign implementation. JavaScript is not run by InDesign; it rather delegates the task to the system at hand, and provides a list of new objects, methods, and properties as an interface to its own internal data.

           

          FindGrep can, for example, find text with formatting, and uses an "extended" set of meta-characters (the ones using a tilde prefix). The built-in JS GREP can only work with simple strings.

           

          Using the dot as pattern does.

           

          The GREP syntax is by no means defined anywhere as strict as, say, a programming language is. Sure, there are conventions that most search engines that call themselves GREP-aware follow, and the period is one of the basics. In the case of the different definitions of "\w", these may come courtesy of Adobe's knowledge of Unicode. It's not a big surprise your set "[0-9A-Za-zÀ-ü]" does work -- all that takes is a Unicode-aware GREP implementation.

           

          E.g. "ß"  was not found by CS3 when using findGrep().

           

          Yeah -- some programmer must have learned something! You can treat "\w" as a very long "[a-zA-ZáíóúñÑ ... fi ff ffl]" list, where occasionally characters are added or removed. But -- again --, that would only work with Adobe's implementation inside InDesign, and would not alter the local JS system.

          • 2. Re: match(/\w/g) does NOT match findGrep() with  .findWhat = "\\w"
            Laubender CommunityMVP

            Hi, Jongware!

            Thank you for clarification.

             

            Interesting to see how Adobe implements GREP functionality with core JavaScript. Again the same example, but this time no InDesign involved:

            var myArray1 = new Array("1,2,3,4,5,6,7,8,9,0,ß,q,w,e,r,t,z,u,i,o,p,ü,a,s,d,f,g,h,j,k,l,ö,ä,y,x,c,v,b,n,m,Q, W,E,R,T,Z,U,I,O,P,Ü,A,S,D,F,G,H,J,K,L,Ö,Ä,Y,X,C,V,B,N,M,_,œ,æ,ç,µ,Á,Û,Ø,Å,Í,Ï,Ì,Ó,ˆ,Œ,Æ,Ù, Ç");
            myArray1.join().match(/\w/g)

             

            InDesign CS3 ExtendScript Toolkit 2 does not get the "ß".

            InDesign CS4 ExtendScript Toolkit does it.

            An improvement. Obviously Adobe is in command here altering GREP inside their Core JavaScript-implementation of ExtendScript.

             

            Another try with JavaScript inside HTML:

            <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
            <html>
            <head>
                  <title>Test_match(/\w/g)</title>
            </head>
            <body>
            </body>
            <script type="text/javascript" language="javascript">
              //Here comes the code:
                var myArray1 = new Array("1,2,3,4,5,6,7,8,9,0,ß,q,w,e,r,t,z,u,i,o,p,ü,a,s,d,f,g,h,j,k,l,ö,ä,y,x,c,v,b,n,m,Q, W,E,R,T,Z,U,I,O,P,Ü,A,S,D,F,G,H,J,K,L,Ö,Ä,Y,X,C,V,B,N,M,_,œ,æ,ç,µ,Á,Û,Ø,Å,Í,Ï,Ì,Ó,ˆ,Œ,Æ,Ù, Ç");
                var myString = myArray1.join().match(/\w/g);
                alert(myString);
            </script>
            </html>

            Safari 4.0.3, FirefFox 3.5.3 and Opera 9.2.7 (all on Mac OS X 10.5.8) together with Adobe Dreamweaver CS4 (LiveView) do not get the "ß".

             

            Thank you again for commenting,

             

            Uwe Laubender