That should work.
Make sure your findWhat string properly escapes the leading backslash:
app.findGrepPreferences.findWhat = "\\+(?=[^>]*?<)";
Thanks for replying
I am not using this grep expression to search within the InDesign document, i am using it to search within a string in my JSX. A sample code is as below
Running the code on Extendscript Toolkit engine or InDesign CC 2015 engine gives the same result.
var input = "<ab+c>he+ll+o<de+f>"
This returns null
An interesting thing is if i use the input as "<ab+c>he+ll+o+<de+f>"
it does find the + after o
Something seems to be missing.
And it finds the 4 "+"!
I am trying to find just the + that are present in between the word hello, i.e. no + that are include between <> are to be matched. Based on this i see your results are not correct.
In your case, I suspect the non-greedy operator ? used after the * quantifier MAY NOT WORK in a lookahead sequence. We must check this though. (There are many bugs related to quantifiers and assertions in ExtendScript regular expressions.)
Due to the way ExtendScript RegExp unexpectedly works with the non-greedy operator, you may try this regex instead:
EDIT: But this won't work if you can have multiple '+' in a tag :-/
In case you have patterns of this form, AA+BB+CC<DD+EE+FF>GG+HH+II<JJ+KK>LL+MM etc., I think you need both a negative and a positive lookahead to only capture '+' outside of the tags.
Then try this:
Thank you Marc, this seems to be doing the trick. However a few quick questions.
- What are the limitations(bugs) of regex in Extendscript, is it documented somewhere? If this is something you got as experience wisdom, what in your opinion should be avoided?
- The regex you gave, seems to be a bit intimidating to me at first look. I take a fair bit of time to come up with a regex. Will try and understand it, if i fail will give you a cry for a help. Hope it won't be a great inconvenience
Thanks a lot Marc and Obi
2 people found this helpful
> What are the limitations of regex in Extendscript, is it documented somewhere?
It's hard to summarize and I don't think an exhaustive report of ExtendScript RegExp issues has been published. Very basically, we know from experience we can encounter backtracking problems with quantifiers. This may involve either the lastIndex property, greedy vs. non-greedy suffix operator (*?, +?, etc.), and/or lookahead assertions. Some facts—among many others—have been discussed here:
> The regex you gave, seems to be a bit intimidating to me at first look. (…)
Yes, it's not en easy one, and I just realized that I complicated it unnecessarily. A simplified form, /.\+(?![^<>]*>)/g, would probably work as well.
Anyway, let's try to explain my reasoning with a picture:
First above all the regex looks after a plus sign \+, then it needs to satisfy two lookahead assertions to validate that match. Keep in mind that an assertion is not supposed to consume further characters in the string (that is, the inner index of the RegExp engine doesn't move during theses validation steps), but the way assertions are designed deeply impacts how the whole regex works.
• In red, the NEGATIVE LOOKAHEAD (NL) assertion, (?!pattern), means that, from the current point, the embedded pattern MUST FAIL. In other words, if that pattern is found, then the condition is not satisfied and the plus sign under consideration won't be captured as a match (whatever the other assertion, which won't be tested at all.) Otherwise, the condition is satisfied and the other assertion is tested.
• In green, the POSITIVE LOOKAHEAD (PL) assertion, (?=pattern), means that, from the same current point (since the index has not moved), the embedded pattern MUST SUCCEED. If that pattern is found, all is fine and the plus sign under consideration is definitely a match. Otherwise, it is ignored.
So those two assertions work as a logical AND: (NL pattern must be KO) AND (PL pattern must be OK.)
Why do we need a negative pattern? To prevent any plus sign nested in a <…> tag from being validated. This is done by testing the pattern [^<>]*>, which means non-markup sign (zero or more times) then a closing mark. This pattern can only succeed from within a tag as it needs to find a form "XXX>" where X is neither a "<" nor a ">".
Why should we need a positive pattern? We shouldn't, in fact! In your case, indeed, the previous condition is sufficient to assert that the plus sign under consideration is external from any tag. That's it :-) However, as I had no hint about extra-conditions or constraints that your input string may undergo—I don't know what you are actually doing!—I found it safer to positively define the pattern in which the plus sign is expected. So I used the PL as something of a reinforcement.
What does the PL require? It looks for the pattern [^><+]*(?:[+<]|$), which is a complicate syntax for just saying "XXXY", where X is neither ">" nor "<" nor "+", and Y is either "+", or "<", or the end of the string ($). This explicitly describes any suffix string that must follow the plus sign under consideration. But, as already said, this positive lookahead is useless here since the negative assertion seems to cover all the issues.
I also made a GIF to illustrate how the regex dynamically works:
Might be of some use.