• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Grep: find text between quotations when the number of words are more than 20?

Participant ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

(?<=“).*?(?=“)

is a grep expression to find text inside quotation marks.

but how to delimit the search to define exactly a number of words inside the curly quotes?

for example, detect only quotations that have more than n words? 50, for example?

this comes as it is a publishing practice to style those «block quotations»  and get broken off without quotation marks.

thanks

Views

9.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Mar 09, 2017 Mar 09, 2017

Did you try Laubender 12 ?

If you copy/paste the expressions you have to look carefully after the curley quotes in the expression.

Best you type the expression yourself. Without using straight quotes.

I only tested with English text. Not e.g. German text where the quotes for opening and closing are very different.

The only GREP expression working straight for me is:
Laubender 12

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){21}”

Obi-wan 15 is also working, but first I had to change the quotes to that:

“(([^ “]

...

Votes

Translate

Translate
Enthusiast ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

To tell ID a certain number or minimum number of times use curly brackets. {50,} is how to delineate 50 or more times.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

Erica,

I am now using this grep as seems better that the mentioned:

“.*?\”

And is fine. Adding the curly brackets is the problem, because it does't work:

“.*?\”{20}

Other think to resolve is that the premise asks for quotations of 20 or more words. Those from 1 to 19 must be not considered.

Thanks, I was upset as this thread is seen as irrelevant.

And it is a superb tool when the author put inside quotes the whole group of quotations inside the text.

Ps. One possible method could be extract all the quotations, place them in an ascending/descending list to filter easily by number of words and, by find/change, cross the information, to isolate one group and style it. But this seems a very dubious method or needing a script. Think a grep should be work.

*******

But grep has formulas to find words, like  \w+…

Transporting from numbers \d{20} which is ok, to words, is nonsense: \w+{20}

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

Maybe a combo of GREP and a script. Have you set this question to the GREP group on Facebook? (Treasures of GREP)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

Answer:

Peter Kahrel has this jewel in his book, that is absolutely perfect!. The rest is insert the quoting marks, isolate with colour and finish.

([\S]+ ){n}

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

Yes great! Will that count punctuation as part of the count, though? So a mixture of 50 anything that isn't a space? Or am I reading that wrong?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

Erica, the idea is isotate very long quotes into  «block quotations», separated from main text.

This PK grep will catch those of {n} determined. The rest may remains in the text.

The rule mentions  more than words, usually 50,. Spaces, etc are irrelevant.

Your post helped me to find again for new paths.

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Mar 08, 2017 Mar 08, 2017

Copy link to clipboard

Copied

Okay so it's slightly flexible. Glad you found the solution! It takes a village!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Hi camilo,

what is the result GREP you are using now?

Something like the one below?

Where a punctuation mark like .!? or )] can end a quote before the closing mark as well.

“([\S]+[ .!?)\]]){21}”

Wouldn't it find exclusively quotes with exactly 21 words?

And no quotes that exceed that number?

Ok. Then let's test this:

“([\S]+[ .!?)\]]){21,}”

where the added , after the number is suggesting to find quotes with 21 words and more.

That would work.

But not very good, if there is more than one quote in a paragraph.

Could be a long one combined with a short one.

Tried that with 3 as value :

“([\S]+[ .!?)\]]){3,}

1-Before-GREP-Find-Quotes-with-3-words-and-more.png

It will find too much.

First instance:

2-FirstInstanceFound-GREP-Find-Quotes-with-3-words-and-more.png

Second found instance:

3-FirstInstanceFound-GREP-Find-Quotes-with-3-words-and-more.png

How can we restrict the greediness of this expression?

Regards,
Uwe

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Uwe, you need to explicitly forbid the character " in the catch-all \S+ (which, by the way, does not need square brackets).

The easiest way to negate the set \S so you can add " to it is to use a character class [^\s"] (i.e., "not a spacing character nor a quote").

You can then build up the Find string as

"([^\s"]+\s+){5,}\S+?"

(where the 5 can be changed to a higher number) ... but much to my dismay, somewhere in between it picks up hard returns!

Oh well. Change it to this more verbose form to fix that:

"([^\s"]+\s+){5,}[^\s\r\n"]+"

where the only thing noteworthy is that the closing quote can be added straight into the negated character class at the end, instead of having to rely on the built-in behavior of "+?". (Some variants I tried with this were extremely slow. Negated character classes work much faster.)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

I just arrived to see this magnitude of superb replies.

Immediately will test all of them but do not know how to asign correct ansers for all the replies.

Thanks a lot, really.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

This one (according on Jongware's approach) could be more concentrated!

"(([^\s"]+)\h){20,}(?2)"

(^/) 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Hi Jongware,

hm. Just testing your suggestion with InDesign CS6 8.1.0.
Still, the expression is too greedy.

Maybe because I am doing something wrong with the closing and opening quotes in the expression copied from your reply. These are pasted as straight quotes, so I substituted them like that:

“([^\s“]+\s+){5,}[^\s\r\n”]+”

InDesign CS6 8.1.0 before finding anything.

Target is Story ("Textabschnitt") of selected text frame:

4-Before-GREP-Find-Quotes-with-3-words-and-more.png

First instance found:

5-FirstInstanceFound-GREP-Find-Quotes-with-3-words-and-more.png

Second instance found:

6-SecondInstanceFound-GREP-Find-Quotes-with-3-words-and-more.png

Third instance found, going over boundaries of paragraph.
Note: It missed the last quote in the first paragraph.

7-ThirdInstanceFound-GREP-Find-Quotes-with-3-words-and-more.png

BTW: Obi Wan's GREP expression is working as expected:

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){3}”

// EDIT: Changed the value for minimum number of instances in Obi-wan's expression to 3.

// That would match my examples with the screenshots.

Regards,
Uwe

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Uwe,

Have you tried the last code I give, based on Jong's approach?

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Not yet…

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Tested the greps:

Laubender/9: the 3 options didn’t work. They found, 0, 9 and 12 matches.

Jongware/13 (second grep) and Laubender/16 both tagged 285 episodes.

(they are the same...?)

Obi/15 crashed. It is working in small batches, apparently fine. Not tested in the whole document. Very greedy.

*****

I checked how many straight opening quotations indeed has the file= 321. (and closing 326... buy it is easy to fix. Obi resolved it in the past, here!)

Changing {5,} to {1} in Jongware/13 and Laubender gives 311 matches

and surprised me that changing the same {5,} to {0,} was 316

Finally, both Laubender and Jongware were very fast and any trace of greeding was perceived.

Conclusion: the supergrep for this task is

“([^\s“]+\s+){n,}[^\s\r\n”]+”

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Did you try Laubender 12 ?

If you copy/paste the expressions you have to look carefully after the curley quotes in the expression.

Best you type the expression yourself. Without using straight quotes.

I only tested with English text. Not e.g. German text where the quotes for opening and closing are very different.

The only GREP expression working straight for me is:
Laubender 12

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){21}”

Obi-wan 15 is also working, but first I had to change the quotes to that:

“(([^ “]+)\h){21,}(?2)”

Did not test with very long quotes. Just the examples you are seeing in my screenshots.

Regards,
Uwe

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Hi, yes, Laubender/12 is perfect, also. Changing (21} to {1} gave 306*

Also your last grep was untadelig (impeccable) as mentioned before.

Obi fine but seems very greedy instead of your last one that is Instamatic [sic].

Thanks and thanks. Specially for that legend alias Jongware.

*that is perfect as we have in the original document before proofreading  5 additional closing quotes ! 311 is the total if opening-closing quotes are matching.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

MDR! 

"(([^\s"]+)\h){n}(?2)"

… is perfect for me!

"(([^\s"]+)\h){0}(?2)" find 1 no-space group between quotes!

"(([^\s"]+)\h){0,3}(?2)" find 1, 2, 3 and 4 no-space groups between quotes!

"(([^\s"]+)\h){3,}(?2)" find 4, 5, 6, … no-space groups between quotes!

"(([^\s"]+)\h){0,}(?2)" find all between quotes!

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

LATEST

Obi, yes , you are right. Here the same!

"(([^\s"]+)\h){0,}(?2)"

It is very good. It found 304 that is the average we have here.

And is working fastly and tuned.

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

https://forums.adobe.com/people/camilo+umana  wrote

Conclusion: the supergrep for this task is

“([^\s“]+\s+){n,}[^\s\r\n”]+”

Unfortunately not for me using InDesign CS6 8.1.0. It'll be too greedy.

What is your version of inDesign? Could be that this expression is working with different versions of inDesign.

Regards,
Uwe

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Hi, Mac, ID 2017, 0 release, Yosemite.

Fortunately the copy-paste worked for the good ones.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

More than 20! As Grep Style:

"([()[\]]?\<[^"]+\>[,;:!?.…()[\]\h]*){21}"

(^/)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Hi Obi-wan,

could you explain what kind of quote ( closing or opening ) goes where in your expression?

Maybe the forum software messed up the quotes?

Thanks,
Uwe

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 09, 2017 Mar 09, 2017

Copy link to clipboard

Copied

Wait. Found it myself:

“([()[\]]?\<[^“]+\>[,;:!?.…()[\]\h]*){21}”

Thanks,

works great!

Uwe

// NOTE: Tested with InDesign CS6 8.1.0 on Mac OSX

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines