1 person found this helpful
This would work if you have only one em dash in your paragraph, because “grep is greedy”
Do you want to include the em dash, or just the text after? And could there be more than one dash in the paragaph, and if so do you want to include only the text after the last one, or all text after the first one?
~_.+$ will find all text (including the dash) from the first dash to the end of the paragraph.
~_.+?$ will find the shortest match, which means it willonly find from the last of multiple dashes to the end of the paragraph.
(?<=~_).+$ puts the dash into a lookbehind, so it won't be included in the found text. .Add the ? as above to limit to the shortest match
Thanks for your questions and answers. Yes I want to include only the text after the last em dash in the paragraph, so if I'm understanding you correctly I would use (?<=~_).+$ in my search string. Is that correct?
Also do you know of any good resources online for better understanding GREP code? Even a list of all kinds of GREP code imaginable would be great. Thanks!
Most of waht I knowabout grep I learned using Peter kahrel's fabulous e-book avaialble at O'Reilly. It costs about 10 bucks and can be used as a PDF or a number of other formats for various readers.
Sorry for coming back to 'answered' question...
sperry1975 wrote:...I would use (?<=~_).+$ in my search string. Is that correct?
Not exactly. This is the expression offered by me and one of these offered by Peter. Main drawback - it will select everything between the first em dash and end of paragraph. If I understood you correctly, you need to select everything between the last one and the end of paragraph.
? to limit to the shortest match in this situation with the end of paragraph doesn't work for me (frankly, dunno why), so I didn't offer it to you. Peter claims it should do the trick, can anyone confirm this? There is how it works for me:
After some puzzling I was able to produce only this:
This query means: select everything except em dash, between em dash and end of paragraph.
I neer actually tested the non-greedy version, and you're right, the ? didn't work for me, either. I think that must be an anomaly about the end-of-paragraph.
I just tested winterm's new expression, and it doesn't quite work, either. Unless the paragraph containing the last dash is also the last paragraph in the story it will select all the following paragraphs as well....
Here's what I've finally come up with that does seem to work: (?<=~_)[^~_]+?$
In this case the ? limiter DOES work. Very odd.
huh, I didn't ever think to test it with multiple paragraphs... grep usually operates inside single paragraph, doesn't it? Really very odd. Btw, the same behaviour with (?<=~_)[^~_]+\r and (?<=~_)[^~_]+?\r.
maybe some grep guru could drop in...
"Not em-dash" is essentially "everything except em-dash" and so includes the hard return \r.
I must say I'm surprised Peter's original suggestion fails as well.
ok then, this seems to work?
Yes, that also seems to work. Evidently the negation also applies to the paragraph break.
Thanks for the help everyone. This is great to have on hand since I will likely use it again and again in the future.