15 Replies Latest reply on Sep 20, 2018 7:21 PM by Jean-Claude Tremblay

# Grep - find a character (or characters) within a word

I have two questions - both related to each other.

1. Supposing I want to find the r character in the selection of words below but ONLY the final r in each of the words. That is, the r must be followed by one of the patterns ai, as, a, ons, ez, ont.

travaillerai

travailleras

travaillera

travaillerons

travaillerez

travailleront

2. I want to find the xyz below as long as it's preceded by blah and followed by blahblah, as in this string:

blahxyzlahblah

Can anyone tell me the GREP expressions that would allow me to do both of these tasks?

• ###### 1. Re: Grep - find a character (or characters) within a word

1: r(?=(ai|as|a|ons|ez|ont)\b)

2: blah\Kxyz(?=lahblah)

• ###### 2. Re: Grep - find a character (or characters) within a word

That’s great. Thank you.

However, I should have been clearer with regard to my second question. I would like to find the xyz when it appears anywhere within a string, in other words it could have any number of random characters before it and any number of random characters after it, e.g.

joexyzbloggs

tarzanxyzofthejungle

What expression would allow for these circumstances?

• ###### 3. Re: Grep - find a character (or characters) within a word

[\u\l]\Kxyz(?=[\u\l])

That is, xyz preceded and followed by a letter-- [\u\l] stands for upper- or lower-case letter.

• ###### 4. Re: Grep - find a character (or characters) within a word

same princip - only other kind of writing the expression (question 1)

r(?=(a[is]?|ez|on[st])\b)

• ###### 5. Re: Grep - find a character (or characters) within a word

That's excellent. Thank you!

• ###### 6. Re: Grep - find a character (or characters) within a word

Another valid approach for the second question, using similar variations on the theme and a capture group on XYZ:

(?:[[:alpha:]]+)(xyz|XYZ)(?:[[:alpha:]]+)

(?i)(?:[[:alpha:]]+)(xyz)(?:[[:alpha:]]+)

(?:\w+)(xyz|XYZ)(?:\w+)

(?i)(?:\w+)(xyz)(?:\w+)

(?i)(?:[a-z]+)(xyz)(?:[a-z]+)

P.S. One may wish to remove the ?: non-capture group modifier to make use of capture groups on the text that is not xyz

Is there any chance that xyz may appear more than once in the string, in any combination of position patterns?

tarzanxyzofthexyzjungle

tarzanxyzofthejunglexyz

tarzanxyzofthexyzjunglexyz

xyztarzanxyzofthexyzjunglexyz

• ###### 7. Re: Grep - find a character (or characters) within a word

Thank you very much, Stephen, for this.

No, for the purposes to which I'm putting these expressions, the pattern in question would never appear more than once in a particular word.

By the way, I need the expressions to highlight quirky/anomalous spellings/endings in a very large table of French verbs.

• ###### 8. Re: Grep - find a character (or characters) within a word

Can anyone point me to a good source where I could learn about GREP expressions? I happened upon an Adobe page but it was less than exhaustive.

• ###### 9. Re: Grep - find a character (or characters) within a word

Sure, in no particular order…

Online testers:

https://regexr.com/

Tutorials:

https://regexone.com/

https://regexone.com/lesson/introduction_abcs

And you can just Google, or look at YouTube or study professional courses on Lynda.com or LinkedIn Learning etc.

• ###### 10. Re: Grep - find a character (or characters) within a word

Pixxxel's and Stephen's alternatives -- a[is]? instead of a|as|ai and/or using ?: -- make perfect sense but don't do much for readability. That's not criticism, simply an observation. Usually, optimising a grep expression involves adding things that make it less readable and very often the way you end up writing a grep expression is a compromise between efficiency and readability.

InDesign's Grep feature uses a fairly standard regular-expression engine, but has some idiosyncracies not covered by the items in Stephen's excellent list. There's a PDF on using Grep in InDesign, which was published by O'Reilly, but it's not available at the moment. It will be published in a new edition next month by CreativePro (https://indesignsecrets.com/).

• ###### 11. Re: Grep - find a character (or characters) within a word

Hi Peter, I’ll post some InDesign links later, as you say the regex is pretty generic, however there are different flavours in different software.

InDesign’s regular expression fields are very limiting and not fun to work in. I spend very little time actually building regular expressions directly in InDesign. This has been a problem in that I did not know of modifiers such as (?i) or case insensitive, as I was used to adding the case insensitive “flag” at the end of the expression using a regex tester interface, that does not exist in InDesign. So when I found out about the use of (?i) it was a revelation. If I had spent more time using InDesign to create regular expressions, then this would not have been “hidden”.

I much prefer to use another tool such as the online testers previously linked. They have more work area and offer syntax highlighting, tool tips etc. and provide a much nicer environment to work in. Due to these reasons, I am less concerned with the “human readable” nature of a regular expression once it has been built in these tools and pasted into InDesign.

As you say, sometimes there are differences and what works in the tester may not always work in InDesign.

Some people like to do crosswords or sudoko, however I like regex. What I love about regular expressions are that all things being equal, there are so many “correct” answers to the same question. Yes, some are more verbose than others and some can be beautiful in their simplicity and conciseness. Some regular expressions are “loose” and may break in unforseen edge-case uses, some are “tight” and bullet proof. You really have to know your own data and test with different variations.

• ###### 13. Re: Grep - find a character (or characters) within a word

> InDesign’s regular expression fields are very limiting and not fun to work in

Quite. That's why I did a script to make life with long Grep expressions easier:

You can add new lines and indents. And it highlights all matches in a text while you type/change a Grep expression. Comes in handy every now and then.

• ###### 14. Re: Grep - find a character (or characters) within a word

A couple of years ago I investigated InDesign's GREP in some detail, and a list of common dialects' working and not-working command codes is at the bottom of my InDesign GREP Help page. Basically InDesign uses the open source boost regex library so that is a good starting point if you want to get down to the itty bitty gritty details. Adobe added their own tilde codes, which may explain some oddities where they don't really work well (from memory).

My list is not updated since CS5 so the very useful code "\K" is missing. I see I don't even include it as 'not working' so it might have been an addition to a newer version of boost. (And a hint to that is that it's also not mentioned in Adobe's own list on Find/Change text in InDesign: Metacharacters for searching; they didn't know that either. So there just may be more hidden features waiting to be found.)

I really like doing crosswords and sudokus and GREP!

• ###### 15. Re: Grep - find a character (or characters) within a word

Hughanagle, you can also join this facebook group on GREP > https://www.facebook.com/groups/TreasuresofGrep/