Grep: find text between quotations when the number of words are more than 20?

Report · Mar 08, 2017

(?<=“).*?(?=“)

is a grep expression to find text inside quotation marks.

but how to delimit the search to define exactly a number of words inside the curly quotes?

for example, detect only quotations that have more than n words? 50, for example?

this comes as it is a publishing practice to style those «block quotations» and get broken off without quotation marks.

thanks

Report · Mar 08, 2017

To tell ID a certain number or minimum number of times use curly brackets. {50,} is how to delineate 50 or more times.

Report · Mar 08, 2017

Erica,

I am now using this grep as seems better that the mentioned:

“.*?\”

And is fine. Adding the curly brackets is the problem, because it does't work:

“.*?\”{20}

Other think to resolve is that the premise asks for quotations of 20 or more words. Those from 1 to 19 must be not considered.

Thanks, I was upset as this thread is seen as irrelevant.

And it is a superb tool when the author put inside quotes the whole group of quotations inside the text.

Ps. One possible method could be extract all the quotations, place them in an ascending/descending list to filter easily by number of words and, by find/change, cross the information, to isolate one group and style it. But this seems a very dubious method or needing a script. Think a grep should be work.

*******

But grep has formulas to find words, like \w+…

Transporting from numbers \d{20} which is ok, to words, is nonsense: \w+{20}

Report · Mar 08, 2017

Maybe a combo of GREP and a script. Have you set this question to the GREP group on Facebook? (Treasures of GREP)

Report · Mar 08, 2017

Answer:

Peter Kahrel has this jewel in his book, that is absolutely perfect!. The rest is insert the quoting marks, isolate with colour and finish.

([\S]+ ){n}

Report · Mar 08, 2017

Yes great! Will that count punctuation as part of the count, though? So a mixture of 50 anything that isn't a space? Or am I reading that wrong?

Report · Mar 08, 2017

Erica, the idea is isotate very long quotes into «block quotations», separated from main text.

This PK grep will catch those of {n} determined. The rest may remains in the text.

The rule mentions more than words, usually 50,. Spaces, etc are irrelevant.

Your post helped me to find again for new paths.

Thanks.

Report · Mar 08, 2017

Okay so it's slightly flexible. Glad you found the solution! It takes a village!

Report · Mar 09, 2017

Hi camilo,

what is the result GREP you are using now?

Something like the one below?

Where a punctuation mark like .!? or )] can end a quote before the closing mark as well.

“([\S]+[ .!?)\]]){21}”

Wouldn't it find exclusively quotes with exactly 21 words?

And no quotes that exceed that number?

Ok. Then let's test this:

“([\S]+[ .!?)\]]){21,}”

where the added , after the number is suggesting to find quotes with 21 words and more.

That would work.

But not very good, if there is more than one quote in a paragraph.

Could be a long one combined with a short one.

Tried that with 3 as value :

“([\S]+[ .!?)\]]){3,}”

It will find too much.

First instance:

Second found instance:

How can we restrict the greediness of this expression?

Regards,
Uwe

Report · Mar 09, 2017

Uwe, you need to explicitly forbid the character " in the catch-all \S+ (which, by the way, does not need square brackets).

The easiest way to negate the set \S so you can add " to it is to use a character class [^\s"] (i.e., "not a spacing character nor a quote").

You can then build up the Find string as

"([^\s"]+\s+){5,}\S+?"

(where the 5 can be changed to a higher number) ... but much to my dismay, somewhere in between it picks up hard returns!

Oh well. Change it to this more verbose form to fix that:

"([^\s"]+\s+){5,}[^\s\r\n"]+"

where the only thing noteworthy is that the closing quote can be added straight into the negated character class at the end, instead of having to rely on the built-in behavior of "+?". (Some variants I tried with this were extremely slow. Negated character classes work much faster.)

Report · Mar 09, 2017

I just arrived to see this magnitude of superb replies.

Immediately will test all of them but do not know how to asign correct ansers for all the replies.

Thanks a lot, really.

Report · Mar 09, 2017

This one (according on Jongware's approach) could be more concentrated!

"(([^\s"]+)\h){20,}(?2)"

(^/)

Report · Mar 09, 2017

Hi Jongware,

hm. Just testing your suggestion with InDesign CS6 8.1.0.
Still, the expression is too greedy.

Maybe because I am doing something wrong with the closing and opening quotes in the expression copied from your reply. These are pasted as straight quotes, so I substituted them like that:

“([^\s“]+\s+){5,}[^\s\r\n”]+”

InDesign CS6 8.1.0 before finding anything.

Target is Story ("Textabschnitt") of selected text frame:

First instance found:

Second instance found:

Third instance found, going over boundaries of paragraph.
Note: It missed the last quote in the first paragraph.

BTW: Obi Wan's GREP expression is working as expected:

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){3}”

// EDIT: Changed the value for minimum number of instances in Obi-wan's expression to 3.

// That would match my examples with the screenshots.

Regards,
Uwe

Report · Mar 09, 2017

Uwe,

Have you tried the last code I give, based on Jong's approach?

(^/)

Report · Mar 09, 2017

Not yet…

Report · Mar 09, 2017

Tested the greps:

Laubender/9: the 3 options didn’t work. They found, 0, 9 and 12 matches.

Jongware/13 (second grep) and Laubender/16 both tagged 285 episodes.

(they are the same...?)

Obi/15 crashed. It is working in small batches, apparently fine. Not tested in the whole document. Very greedy.

*****

I checked how many straight opening quotations indeed has the file= 321. (and closing 326... buy it is easy to fix. Obi resolved it in the past, here!)

Changing {5,} to {1} in Jongware/13 and Laubender gives 311 matches

and surprised me that changing the same {5,} to {0,} was 316

Finally, both Laubender and Jongware were very fast and any trace of greeding was perceived.

Conclusion: the supergrep for this task is

“([^\s“]+\s+){n,}[^\s\r\n”]+”

Report · Mar 09, 2017

Did you try Laubender 12 ?

If you copy/paste the expressions you have to look carefully after the curley quotes in the expression.

Best you type the expression yourself. Without using straight quotes.

I only tested with English text. Not e.g. German text where the quotes for opening and closing are very different.

The only GREP expression working straight for me is:
Laubender 12

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){21}”

Obi-wan 15 is also working, but first I had to change the quotes to that:

“(([^ “]+)\h){21,}(?2)”

Did not test with very long quotes. Just the examples you are seeing in my screenshots.

Regards,
Uwe

Report · Mar 09, 2017

Hi, yes, Laubender/12 is perfect, also. Changing (21} to {1} gave 306*

Also your last grep was untadelig (impeccable) as mentioned before.

Obi fine but seems very greedy instead of your last one that is Instamatic [sic].

Thanks and thanks. Specially for that legend alias Jongware.

*that is perfect as we have in the original document before proofreading 5 additional closing quotes ! 311 is the total if opening-closing quotes are matching.

Report · Mar 09, 2017

MDR!

"(([^\s"]+)\h){n}(?2)"

… is perfect for me!

"(([^\s"]+)\h){0}(?2)" find 1 no-space group between quotes!

"(([^\s"]+)\h){0,3}(?2)" find 1, 2, 3 and 4 no-space groups between quotes!

"(([^\s"]+)\h){3,}(?2)" find 4, 5, 6, … no-space groups between quotes!

"(([^\s"]+)\h){0,}(?2)" find all between quotes!

(^/)

Report · Mar 09, 2017

Obi, yes , you are right. Here the same!

"(([^\s"]+)\h){0,}(?2)"

It is very good. It found 304 that is the average we have here.

And is working fastly and tuned.

Thanks.

Report · Mar 09, 2017

https://forums.adobe.com/people/camilo+umana wrote
… Conclusion: the supergrep for this task is
“([^\s“]+\s+){n,}[^\s\r\n”]+”

Unfortunately not for me using InDesign CS6 8.1.0. It'll be too greedy.

What is your version of inDesign? Could be that this expression is working with different versions of inDesign.

Regards,
Uwe

Report · Mar 09, 2017

Hi, Mac, ID 2017, 0 release, Yosemite.

Fortunately the copy-paste worked for the good ones.

Report · Mar 09, 2017

More than 20! As Grep Style:

"([()[\]]?\<[^"]+\>[,;:!?.…()[\]\h]*){21}"

(^/)

Report · Mar 09, 2017

Hi Obi-wan,

could you explain what kind of quote ( closing or opening ) goes where in your expression?

Maybe the forum software messed up the quotes?

Thanks,
Uwe

Report · Mar 09, 2017

Wait. Found it myself:

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){21}”

Thanks,

works great!

Uwe

// NOTE: Tested with InDesign CS6 8.1.0 on Mac OSX

Adobe Community

Grep: find text between quotations when the number of words are more than 20?

1 Correct answer