Skip navigation
Marcos Suárez
Currently Being Moderated

Script/grep to find a range of text marked with tags

May 25, 2012 2:33 AM

I need a script/grep to find a range of text marked with tags. Can exist in the range 0, 1 or more "\ r".

It would be something like this ...

 

<cite>((.*\r)+?.*?)</cite>

 

The script/grep should always select the shortest match:

 

<cite>xxxxxxxx.

xxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxx

 

xxxxxx

xxxxxxxxxx</cite>

 

<cite>xxxxxxxx.

xxxxxxxxxxxxxxxxxxx

 

xxxxxxxxxxxx

 

xxxxxxxxxxxx</cite>

 

 

Thanks in advance ...!!!

 
Replies
  • Currently Being Moderated
    May 25, 2012 4:57 AM   in reply to Marcos Suárez

    Paragraph breaks ("\r") cause problems in GREP because they stop the current expression evaluation. To make GREP ignore paragraph returns (*), use the Single-Line Mode flag like this:

     

    (?s)<cite>(.+?)</cite>

     

    (*) The trick is that "?s" changes the behavior of the code "\r". By default, it does not match "any character" -- the single period --, so the expression halts at it when using ".+". However, this flag temporarily changes the "\r" code so it does match "." as well, and so the GREP expression will digest it and continue on to the next line as if nothing happened.

     
    |
    Mark as:
  • Currently Being Moderated
    May 25, 2012 5:01 AM   in reply to [Jongware]

    As for the name and shortcut, I think the idea is that if you ignore the regular behavior of hard returns, your entire text is treated as one single long line without any breaks anywhere inside so your text goes on and on and you always need to use a Shortest Match code when you use wildcards so the GREP will not match everything and anything all the way to the end of this single very long line, which essentially means that it takes up everything all the way to the very last line in your document and usually that is not what you want.

     
    |
    Mark as:
  • Currently Being Moderated
    May 25, 2012 9:56 AM   in reply to [Jongware]

    Thank you so much for identifying this js regex feature! 

     

    I am accustomed to perl regex operations and have bitterly missed the ability to do captures involving multiple lines.  I'm sure the documentation was plain to read somewhere, but I never could find it.

     

    (?s) is my new best friend ...

     
    |
    Mark as:

More Like This

  • Retrieving data ...

Bookmarked By (1)

Answers + Points = Status

  • 10 points awarded for Correct Answers
  • 5 points awarded for Helpful Answers
  • 10,000+ points
  • 1,001-10,000 points
  • 501-1,000 points
  • 5-500 points