Copy link to clipboard
Copied
I have two questions - both related to each other.
1. Supposing I want to find the r character in the selection of words below but ONLY the final r in each of the words. That is, the r must be followed by one of the patterns ai, as, a, ons, ez, ont.
travaillerai
travailleras
travaillera
travaillerons
travaillerez
travailleront
2. I want to find the xyz below as long as it's preceded by blah and followed by blahblah, as in this string:
blahxyzlahblah
Can anyone tell me the GREP expressions that would allow me to do both of these tasks?
Thanks in advance.
[\u\l]\Kxyz(?=[\u\l])
That is, xyz preceded and followed by a letter-- [\u\l] stands for upper- or lower-case letter.
Copy link to clipboard
Copied
1: r(?=(ai|as|a|ons|ez|ont)\b)
2: blah\Kxyz(?=lahblah)
Copy link to clipboard
Copied
That’s great. Thank you.
However, I should have been clearer with regard to my second question. I would like to find the xyz when it appears anywhere within a string, in other words it could have any number of random characters before it and any number of random characters after it, e.g.
joexyzbloggs
tarzanxyzofthejungle
What expression would allow for these circumstances?
Thank you in advance.
Copy link to clipboard
Copied
[\u\l]\Kxyz(?=[\u\l])
That is, xyz preceded and followed by a letter-- [\u\l] stands for upper- or lower-case letter.
Copy link to clipboard
Copied
That's excellent. Thank you!
Copy link to clipboard
Copied
Another valid approach for the second question, using similar variations on the theme and a capture group on XYZ:
(?:[[:alpha:]]+)(xyz|XYZ)(?:[[:alpha:]]+)
(?i)(?:[[:alpha:]]+)(xyz)(?:[[:alpha:]]+)
(?:\w+)(xyz|XYZ)(?:\w+)
(?i)(?:\w+)(xyz)(?:\w+)
(?i)(?:[a-z]+)(xyz)(?:[a-z]+)
P.S. One may wish to remove the ?: non-capture group modifier to make use of capture groups on the text that is not xyz
Is there any chance that xyz may appear more than once in the string, in any combination of position patterns?
tarzanxyzofthexyzjungle
tarzanxyzofthejunglexyz
tarzanxyzofthexyzjunglexyz
xyztarzanxyzofthexyzjunglexyz
Copy link to clipboard
Copied
Thank you very much, Stephen, for this.
No, for the purposes to which I'm putting these expressions, the pattern in question would never appear more than once in a particular word.
By the way, I need the expressions to highlight quirky/anomalous spellings/endings in a very large table of French verbs.
Copy link to clipboard
Copied
Can anyone point me to a good source where I could learn about GREP expressions? I happened upon an Adobe page but it was less than exhaustive.
Copy link to clipboard
Copied
Sure, in no particular order…
Online testers:
Regex Tester - Javascript, PCRE, PHP
Regex Tester and Debugger Online - Javascript, PCRE, PHP
Tutorials:
https://regexone.com/lesson/introduction_abcs
https://qntm.org/files/re/re.html
Regex Tutorial—From Regex 101 to Advanced Regex
And you can just Google, or look at YouTube or study professional courses on Lynda.com or LinkedIn Learning etc.
Copy link to clipboard
Copied
Pixxxel's and Stephen's alternatives -- a[is]? instead of a|as|ai and/or using ?: -- make perfect sense but don't do much for readability. That's not criticism, simply an observation. Usually, optimising a grep expression involves adding things that make it less readable and very often the way you end up writing a grep expression is a compromise between efficiency and readability.
InDesign's Grep feature uses a fairly standard regular-expression engine, but has some idiosyncracies not covered by the items in Stephen's excellent list. There's a PDF on using Grep in InDesign, which was published by O'Reilly, but it's not available at the moment. It will be published in a new edition next month by CreativePro (https://indesignsecrets.com/).
Copy link to clipboard
Copied
Hi Peter, I’ll post some InDesign links later, as you say the regex is pretty generic, however there are different flavours in different software.
InDesign’s regular expression fields are very limiting and not fun to work in. I spend very little time actually building regular expressions directly in InDesign. This has been a problem in that I did not know of modifiers such as (?i) or case insensitive, as I was used to adding the case insensitive “flag” at the end of the expression using a regex tester interface, that does not exist in InDesign. So when I found out about the use of (?i) it was a revelation. If I had spent more time using InDesign to create regular expressions, then this would not have been “hidden”.
I much prefer to use another tool such as the online testers previously linked. They have more work area and offer syntax highlighting, tool tips etc. and provide a much nicer environment to work in. Due to these reasons, I am less concerned with the “human readable” nature of a regular expression once it has been built in these tools and pasted into InDesign.
As you say, sometimes there are differences and what works in the tester may not always work in InDesign.
Some people like to do crosswords or sudoko, however I like regex. What I love about regular expressions are that all things being equal, there are so many “correct” answers to the same question. Yes, some are more verbose than others and some can be beautiful in their simplicity and conciseness. Some regular expressions are “loose” and may break in unforseen edge-case uses, some are “tight” and bullet proof. You really have to know your own data and test with different variations.
Copy link to clipboard
Copied
Thanks to your great suggestions and resources/links, I feel a regex addiction coming on
Copy link to clipboard
Copied
> InDesign’s regular expression fields are very limiting and not fun to work in
Quite. That's why I did a script to make life with long Grep expressions easier:
http://www.kahrel.plus.com/indesign/grep_editor.html
You can add new lines and indents. And it highlights all matches in a text while you type/change a Grep expression. Comes in handy every now and then.
Copy link to clipboard
Copied
A couple of years ago I investigated InDesign's GREP in some detail, and a list of common dialects' working and not-working command codes is at the bottom of my InDesign GREP Help page. Basically InDesign uses the open source boost regex library so that is a good starting point if you want to get down to the itty bitty gritty details. Adobe added their own tilde codes, which may explain some oddities where they don't really work well (from memory).
My list is not updated since CS5 so the very useful code "\K" is missing. I see I don't even include it as 'not working' so it might have been an addition to a newer version of boost. (And a hint to that is that it's also not mentioned in Adobe's own list on Find/Change text in InDesign: Metacharacters for searching; they didn't know that either. So there just may be more hidden features waiting to be found.)
I really like doing crosswords and sudokus and GREP!
Copy link to clipboard
Copied
Hughanagle, you can also join this facebook group on GREP > https://www.facebook.com/groups/TreasuresofGrep/
Copy link to clipboard
Copied
same princip - only other kind of writing the expression (question 1)
r(?=(a[is]?|ez|on[st])\b)