11 Replies Latest reply on Sep 2, 2012 3:11 PM by Trevorׅ

    Grep to find first occurrence of a particular word in a story

    Trevorׅ Adobe Community Professional

      Hello Grepers

       

      I haven't had luck working out a Grep to find first / last occurrence of a particular word in a story.

       

      To find the first occurrence of the word hello in a paragraph I would use this grep.

       

      (Hello(?!=Hello*$))

       

      For the last this works.

       

      ((?=Hello(?=.*Hello)))|(Hello(?!=Hello*$))

       

      I have tried single line and multiline prefixes but they don't do the trick.

       

      Waiting in suspence,

       

      Trevor

        • 1. Re: Grep to find first occurrence of a particular word in a story
          Peter Spier Most Valuable Participant (Moderator)

          Look ahead and look behind only work for the text that is immediately adjacent to the term you are trying to find, and the ^ and $ locators only work if the search term is the very first or very last bit of text in the paragraph.

           

          I may be wrong, but I don't think there is a way to find only the first or only the last occurrence of a word in a paragraph in the generalized case, i.e. I don't  think it's possible in a sentence like "I said 'hello' to the man on the bike, and he smiled and said, 'hello to you, too.'"

          • 2. Re: Grep to find first occurrence of a particular word in a story
            Trevorׅ Adobe Community Professional

            Hi Peter

             

            I think you miss understood what I wrote.

            The Greps I wrote for finding the first and last occurrence in the paragraph do work.

            You can try them.

            This is because they either or negative and therefore don't need to be adjacent or have a wildcard .*

             

            My question was how to get the first or last occurrence in the whole story.

            Normally by adding(?s) to the beginning of the grep it treats the whole story as on paragraph and ^ and $ as beginning and end of story markers.

             

            In this case the (?s) doesn't help.

            ScreenShot063.png

            ScreenShot062.png

            • 3. Re: Grep to find first occurrence of a particular word in a story
              Peter Spier Most Valuable Participant (Moderator)

              Your expression for the first occurrence seems to work, though I don't know why. My understanding is that you cannot use variable strings in a look ahead/behind. Your second expression does not work here -- it puts the cursor in front of the first hello, but highlights nothing.

               

              Don't know if it will help, but the end of story marker is \z

               

              I eagerly await some input from Jongware....

              • 5. Re: Grep to find first occurrence of a particular word in a story
                Trevorׅ Adobe Community Professional

                Jongware

                 

                I never would have believed!

                (nice png)

                 

                Well we'll have to wait for Marc or Peter Karhel or someone else to wake up.

                 

                Peter you are wrong they both work  the same.

                 

                I see that with both expressions it depends on were the cursor is placed.

                If the cursor is not between the Hellos they work fine if it is between the hellos they don't work Yikes

                 

                Still waiting in suspence. (allthough in the meentime I shall have my nightly sleep)

                • 6. Re: Grep to find first occurrence of a particular word in a story
                  [Jongware] Most Valuable Participant

                  (OT: For the Terminally Annoyed only!: http://www.jongware.com/binaries/annoyingBender.zip

                   

                  .. For some reason, this forum rejects the totally valid PNGs -- as per pngcheck -- this creates. Annoying indeed.)

                  • 7. Re: Grep to find first occurrence of a particular word in a story
                    Trevorׅ Adobe Community Professional

                    Well Just in case this post isn't confusing enough.

                     

                    When  I put the greps into my text / grep list swapper script forums.adobe.com/thread/1033278?tstart=12 (with the ; at the beginning as required by my script) they  all work fine

                     

                    If they are put in a paragraph grep style then Grep 1 processes Hello only if there is 1 Hello in the paragraph and Grep 2 processes all occurrences of Hello in the paragraph.

                     

                    Haven't tried out stait JS regex's

                    • 8. Re: Grep to find first occurrence of a particular word in a story
                      Marc Autret Level 4

                      ~ Trevor ~ wrote:

                       

                      Well we'll have to wait for Marc or Peter Karhel or someone else to wake up.

                       

                       

                      Hi Trevor,

                       

                      To my regret I have no answer to the main question (find 1st occurrence of a particular word in a story). I believe we'd need a lookbehind but, unfortunately, a fixed-length pattern is required in lookbehinds so you can't use with success expressions like .*?.

                       

                      Also, it seems to me that the GREP patterns you've posted are wrong and only work by accident. For example:

                       

                      (Hello(?!=Hello*$))

                       

                      actually means: find 'Hello' if and only if the match is not followed by '=Hell' and 'o' zero or more times and an EOL. It doesn' make sense to me. It works anyway—or seems to—due to the strange way GREP deals with the $ metachar in lookarounds. Indeed, it happens that $ leads the GREP engine to 'consume' the entire paragraph despite the lookahead syntax. Thus, the next match is searched in the next paragraph(s). To be honest, I absolutely don't understand why and how this actually works.

                       

                       

                      With this in mind, here are the patterns I'd suggest you try:

                       

                      1) Find only the FIRST occurrence of 'Hello' in paragraph(s):

                       

                      Hello(?=.*$)
                      

                       

                      Note: as said above, the ending $ makes it working but I don't know why!

                       

                      2) Find only the FIRST occurrence of 'Hello' in story(ies):

                       

                      No answer so far. However, since you are in the scripting environment, you can take advantage of the following pattern:

                       

                      (?s)\A(.*?)Hello
                      

                       

                      and then extract the 5 last characters from the resulting matches.

                       

                      3) Find only the LAST occurrence of 'Hello' in paragraph(s):

                       

                      (Hello)(?!.*?\1)
                      

                       

                      Note: Here I use a negative lookahead the regular way, which asserts that 'Hello' is not any further followed by another occurrence. In detail:

                       

                      (Hello)    Capture the string into \1.

                      (?!    Negative lookahead

                      .*?    Make .* non-greedy

                      \1    Alias of the 1st found ('Hello')

                       

                      4) Find only the LAST occurrence of 'Hello' in story/ies:

                       

                      (?s)(Hello)(?!.*?\1)
                      

                       

                      Note: same as above, except that I use (?s) to make sure that the dot also matches paragraph returns.

                       

                      @+

                      Marc

                      2 people found this helpful
                      • 9. Re: Grep to find first occurrence of a particular word in a story
                        Trevorׅ Adobe Community Professional

                        Hi Marc,

                         

                        Thanks for your reply, I'm glad you got my subtle hints.

                         

                        1) First Hello in the paragraph(s)

                        Marc Autret wrote:

                         

                         

                        Also, it seems to me that the GREP patterns you've posted are wrong and only work by accident. For example:

                         

                        (Hello(?!=Hello*$))

                         

                        actually means: find 'Hello' if and only if the match is not followed by '=Hell' and 'o' zero or more times and an EOL.

                         

                        Well Marc I made a Hell of a mistake!!!

                         

                        I meant to write (Hello(?!=Hello.*$)) with a dot there which at least is not quite as stupid if not more correct.

                         

                        In fact both (Hello(?!=$)) without a . and Hello(?!=^) work, well sort of work for the first Hello of the paragraph(s) depending on the were one start the search from (before the first Hello works otherwise it will find the next "first" Hello of the paragraph.

                         

                        I said sort of because I naively presumed that GREP styles would work like GREPS, dumb hey.

                        I was really looking to use these GREPS in GREP styles, well you can't.

                         

                        In a GREP style

                        (Hello(?!=$)) or any of its variants will apply the style to all occurrences of Hello in the paragraph.

                        I can partially circumvent this problem by adding a second GREP STYLE which applies an anti style to all words after the first Hello (?<=Hello).*

                        i.e. I have a want my first Hello to be bolded then set the first style for bold and the second to regular.

                         

                        This however is not practical if one wants to auto style more than one word in this way in other words to do the same for the first Hi and the first Hello in the paragraph.

                         

                        As a regular GREP solution providing that one starts the GREP search before the first occurrence of Hello It will work.

                         

                        2 & 4) First and Last Hello in the story

                         

                        Again these GREPs don't work using GREP styles, this I think is because GREP paragraph styles only look within the one paragraph at a time they are applied to.  So they can't look at the preceding or following paragraphs to see if they contain Hello or not.

                        If I am correct I see no work around to this and am will to pay 10 Pounds, Euros or Dollars to whoever comes up with a non-script fully functional  GREP styles solution for this (I think my money's safe)

                         

                        Regarding the regular non styled GREPs they nearly work as stated.

                        When (?s)(Hello)(?!.*?\1) is used to find the last occurrence of Hello in the story, if the find tab is used it will firstly find the last Hello of the story then it will go back and find the one before that!! then it will go onto the next story.

                         

                        Using the GREP in script works as stated.

                        The easiest way of course to find the first and last Hello in a story, document etc. by script would be

                         

                        app.findGrepPreferences.findWhat = Hello"
                        myFinds = myWhatEver.findGrep();
                        

                         

                        First occurrence   myFinds[0], last occurrence   myFinds[-1].

                        HOWEVER NOT PARTICULARLY EFFICIENT!

                         

                        3) Last occurrence  of Hello in paragraph.

                        This one works perfectly both with regular GREPS and GREP styles.

                         

                        In summary

                         

                        GREP STYLES: only the (Hello)(?!.*?\1) last Hello in the paragraph GREP works.

                         

                        GREP FIND TAB: the first Hello in the paragraph GREP finds the first Hello after the cursor, the first in the story works in the limited way as written, last in story has the problem of finding the second last Hello after finding the last Hello, Last in paragraph works flawlessly.

                         

                        GREP SCRIPTING: all work without problem except for the first Hello in the story that has the problem of needing to extract the last 5 letters which for my automatic text / GREP changer is a bit of a problem but for general scripting would not be problematic.

                         

                        Once again Marc thanks for your input, I doubt there's much if anything  to add on the topic maybe Laurent Tournie from indigrep.com might have some ideas.  I don't know his contact details so if you think it's a good idea please can you send him a tweet / mail.

                         

                        Regards

                         

                        Trevor

                        • 10. Re: Grep to find first occurrence of a particular word in a story
                          Marc Autret Level 4

                          Hi Trevor,

                           

                          Sorry, I did not realize you were looking for GREP style patterns!! My previous post only suggests find/replace-oriented GREP patterns.

                           

                          By nature GREP styles act at the paragraph level and can't lookahead over that scope. Hence, I'm afraid you'll never find a valid solution to any 'per story' problem. To me, the subject of your topic—Grep to find first occurrence of a particular word in a story—does make sense in GREP but it does not in GREP styles.

                           

                          NB: About your (Hello(?!=Hello.*$)) pattern, it seems you didn't notice that (?!= is not the syntax of the negative lookahead: the equal sign is not part of that syntax. The pattern: (Hello(?!=$)) will work as long as your paragraph does not contain: "Hello=\r" Now, what I meant about finding the 1st occurrence of Hello in paragraph is that the purpose of the lookahead is only to 'consume' the whole paragraph. For some reason, the $ placeholder makes it working. What matters is to use a lookahead expression that always work, such as (?=.*$) —which here is a positive lookahead. That's why I finally suggested the pattern: Hello(?=.*$), but there is no doubt that many other patterns will do the job too.

                           

                          @+

                          Marc

                          • 11. Re: Grep to find first occurrence of a particular word in a story
                            Trevorׅ Adobe Community Professional

                            Hi Marc,

                             

                            I was looking for both regular and GREP style patterns but after trying your GREPS I came to realize the obvious that as you stated

                            By nature GREP styles act at the paragraph level and can't lookahead over that scope

                            Hence I made my super duper offer

                            If I am correct I see no work around to this and am will to pay 10 Pounds, Euros or Dollars to whoever comes up with a non-script fully functional  GREP styles solution for this (I think my money's safe)

                            Regarding my other mistake (Hello(?!=Hello.*$)) your right I did carelessly stick in the = sign.

                             

                            Hello(?!^) and (?!$)Hello find every (and only the) first occurrence of Hello (after the cursor) in the paragraph regardless of where it is in the paragraph and of what precedes or follows it.

                            Hello(?!$) works in all cases except were Hello is the last word in the paragraph.

                            (?<!^)Hello works in all cases except were Hello is the first word in the paragraph.

                             

                            These formulas remind me very much of Jongware's answer to my question on boolean GREPS

                            http://forums.adobe.com/message/4382158#4382158

                             

                            Anyway after all these Hellos I think it's time to say bye bye.

                             

                            Trevor