Skip navigation
Currently Being Moderated

match(/\w/g) does NOT match findGrep() with  .findWhat = "\\w"

Oct 23, 2009 9:14 AM

Hello, everybody!

 

As you might know \w is not \w when it comes to GREP search by different means. I learned the hard way that I cannot be sure when using match() or findGrep() that I'm getting identical results. The difference in technique seems clear to me: match() uses a search in memory independing of the InDesign-specific GREP search.

 

Please correct me if I'm wrong, but shouldn't both methods yield the same results using the same GREP pattern? Using the dot as pattern does.

 

Is it because of a "bug" in ExtendScript?

 

To illustrate the problem I wrote a script that opens a new document with a text frame labeled "ToSearch" with a lot of characters from the German keybord. Then that will be searched by findGrep() and match() alike. Results are written to new text frames on the page labeled accordingly.

 

The script searches for the following patterns:
\w
[0-9A-Za-zÀ-ü]

 

In the case of [0-9A-Za-zÀ-ü] there is no difference in "scope" of the GREP pattern.
\w is a different beast. German Umlauts are not to be found, etc.etc.

 

But see for yourself. I tested this script with InDesignCS3 ESTK 2 and InDesign CS4 ESTK. There are inconsistent results from CS3 to CS4. CS4 shows more find results than CS3. E.g. "ß"  was not found by CS3 when using findGrep().

 

Is there any documentation how Adobe implemented core JavaScript GREP to ExtendScript so I can be sure of the specific "scope" of a GREP wildcard?

 

Hope you can help,

Uwe Laubender

 

Attached Script: FindGrep_vs_match.jsx

Attachments:
 
Replies
  • Currently Being Moderated
    Oct 23, 2009 12:49 PM   in reply to Laubender

    I think the difference you see is not one within InDesign itself, but rather the difference between JavaScript built-in GREP (which is not written by Adobe) and the InDesign implementation. JavaScript is not run by InDesign; it rather delegates the task to the system at hand, and provides a list of new objects, methods, and properties as an interface to its own internal data.

     

    FindGrep can, for example, find text with formatting, and uses an "extended" set of meta-characters (the ones using a tilde prefix). The built-in JS GREP can only work with simple strings.

     

    Using the dot as pattern does.

     

    The GREP syntax is by no means defined anywhere as strict as, say, a programming language is. Sure, there are conventions that most search engines that call themselves GREP-aware follow, and the period is one of the basics. In the case of the different definitions of "\w", these may come courtesy of Adobe's knowledge of Unicode. It's not a big surprise your set "[0-9A-Za-zÀ-ü]" does work -- all that takes is a Unicode-aware GREP implementation.

     

    E.g. "ß"  was not found by CS3 when using findGrep().

     

    Yeah -- some programmer must have learned something! You can treat "\w" as a very long "[a-zA-ZáíóúñÑ ... fi ff ffl]" list, where occasionally characters are added or removed. But -- again --, that would only work with Adobe's implementation inside InDesign, and would not alter the local JS system.

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (0)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points