I need a script/grep to find a range of text marked with tags. Can exist in the range 0, 1 or more "\ r".
It would be something like this ...
<cite>((.*\r)+?.*?)</cite>
The script/grep should always select the shortest match:
<cite>xxxxxxxx.
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxx
xxxxxx
xxxxxxxxxx</cite>
<cite>xxxxxxxx.
xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxx</cite>
Thanks in advance ...!!!
Paragraph breaks ("\r") cause problems in GREP because they stop the current expression evaluation. To make GREP ignore paragraph returns (*), use the Single-Line Mode flag like this:
(?s)<cite>(.+?)</cite>
(*) The trick is that "?s" changes the behavior of the code "\r". By default, it does not match "any character" -- the single period --, so the expression halts at it when using ".+". However, this flag temporarily changes the "\r" code so it does match "." as well, and so the GREP expression will digest it and continue on to the next line as if nothing happened.
As for the name and shortcut, I think the idea is that if you ignore the regular behavior of hard returns, your entire text is treated as one single long line without any breaks anywhere inside so your text goes on and on and you always need to use a Shortest Match code when you use wildcards so the GREP will not match everything and anything all the way to the end of this single very long line, which essentially means that it takes up everything all the way to the very last line in your document and usually that is not what you want.
Thank you so much for identifying this js regex feature!
I am accustomed to perl regex operations and have bitterly missed the ability to do captures involving multiple lines. I'm sure the documentation was plain to read somewhere, but I never could find it.
(?s) is my new best friend ...
North America
Europe, Middle East and Africa
Asia Pacific