13 Replies Latest reply on Nov 5, 2012 5:54 AM by samar02

    GREP: finding two identical consecutive strings

    samar02 Level 1

      Hi,

      I should find find all occurrences of two identical words (or string of characters) that occur consecutively, like those at the beginning of this sentence. Is this possible with ID?

       

      I would have thought that

       

      (\w+) $1

       

      would do this, but it does not.

        • 1. Re: GREP: finding two identical consecutive strings
          Eugene Tyson Adobe Community Professional & MVP

          You'd have to do it on a case by case basis.

           

          It would have to be

           

          (find\s){2}

           

          or

           

          (found\s){2}

           

          This isn't the Ideal way to do it.

           

          Best bet is to go to Preferences>Spelling and only turn on the "Repeated" words

           

          Then do a spell check - and replace it that way.

           

          You can also turn on Dynamic Spelling to highlight duplicated words.

           

           

          I'm sure it could be scripted to find duplicated words - perhaps try the scripting forum.

          • 2. Re: GREP: finding two identical consecutive strings
            [Jongware] Most Valuable Participant

            The syntax of a Find string is different from the Change string. Use this syntax instead:

             

            (\w+) \1

             

            -- but it will find one or more consecutive same characters separated by a space. E.g., it will find your "find find" but also the "t t" in "at the".  If you want to find entire words only, a good first attempt would be:

             

            (\b\w+) \1

             

            but it will fail for some fairly common phrases. It'll find "for foreign lands", "in international relations" and "be better". To exclusively find duplicate words, use this:

             

            (\b\w+) \1\b

             

            or this; slightly better because it will find triplicates as well:

             

            (\b\w+)( \1)+\b

             

            Edit: Hi Eugene.

            • 3. Re: GREP: finding two identical consecutive strings
              [Jongware] Most Valuable Participant

              Hm. Due to the definitions of both "\w" ('what is a Word character) and "\b" ("what is a Word Break), my proposed GREP will also find this

               

              it's on one one-sided page

               

              Whether or not this is a valid 'duplicate word' depends on how you define "word". For instance, if "one-sided" is a single word, you can use this to have it not match the above:

              (\b\w+)( \1)+(?![-\w])

               

              You are effectively adding the character '-' to the "Word Character" set on the right. But, in that case you also should add it to the left! Otherwise, it will still (falsely) match

              it’s an add-on on your system

               

              Changing the Word Character set on the left as well will make it ignore this occurrence. But note you cannot use \b anymore! It would still pick up the '-' character as a valid 'word break', and the expression [-\w]+ that follows it would never see the hyphen. That leads us to this:

               

              (?<![-\w])([-\w]+)( \1)+(?![-\w])

               

              ... I think I'm going to leave adding the slash as well to you ...

              • 4. Re: GREP: finding two identical consecutive strings
                Eugene Tyson Adobe Community Professional & MVP

                I did not know you could do that!

                • 5. Re: GREP: finding two identical consecutive strings
                  samar02 Level 1

                  Thank you, Jongware. This is exactly what I was looking (and hoping) for. For my purposes, the string

                   

                  (\b\w+) \1\b

                   

                  is just perfect. I can modify it to even find identical words separated by punctuation:

                   

                  (\b\w+)[,;:.] \1\b

                   

                  Wonderful.

                  • 6. Re: GREP: finding two identical consecutive strings
                    [Jongware] Most Valuable Participant

                    Eugene Tyson wrote:

                     

                    I did not know you could do that!

                     

                    (g) You can have a bit of fun with it. This will find 5-letter palindromes in your text:

                     

                    \b(\w)(\w)\w\2\1\b

                     

                    (e.g., "level", "civic", "refer"). Unfortunately there is no any-length GREP to find them

                     

                    This will find three consecutive same characters:

                     

                    (\w)\1\1

                     

                    -- heh heh, I just found a "Classsroom" in the book I'm working on!

                    • 7. Re: GREP: finding two identical consecutive strings
                      [Jongware] Most Valuable Participant
                      I wrote:

                       

                       

                       

                      Unfortunately there is no any-length GREP to find them

                       

                      and there isn't (as far as I know), but this comes close:

                       

                      \b(\w)(\w)?(\w)?(\w)?(\w)?(\w)?\w?(?(6)\6)(?(5)\5)(?(4)\4)(?(3)\3)(?(2)\2)(?(1)\1)\b

                       

                      It will find any palindrome word from 2 to 13 characters, and by extending the counting up to 9 it would be able to find as much as 19 characters!

                       

                      Testing on the 109,000 entries in my English words list ... "deified" -- 7 letters. "malayalam", 9 letters! (That's a language, by the way.) "reviver" and "rotator" are also nice ones.

                      • 8. Re: GREP: finding two identical consecutive strings
                        [Jongware] Most Valuable Participant

                        <g> Well as it is Friday afternoon:

                         

                        This monstrosity will flesh out palindromes regardless of punctuation:

                         

                        (?i)(?=[[:alpha:]])(\w)?[.,:;'? ]*?(\w)?[.,:;'? ]*(\w)?[.,:;'? ]*(\w)?[.,:;'? ]*(\w)?[.,:;'? ]*(\w)?[.,:;'? ]*(\w)?[.,:;'? ]*(\w)?[.,:;'? ]*(\w)[.,:;'? ]*\w?[.,:;'? ]*(?(9)\9[.,:;'? ]*)(?(8)\8[.,:;'? ]*)(?(7)\7[.,:;'? ]*)(?(6)\6[.,:;'? ]*)(?(5)\5[.,:;'? ]*)(?(4)\4[.,:;'? ]*)(?(3)\3[.,:;'? ]*)(?(2)\2[.,:;'? ]*)(?(1)\1)

                         

                        The limit of 19 characters -- 9 at the left, one in the middle, same 9 at the right -- can be seen in this snippet from the Wikipedia article on palindromes:

                         

                        palindrome.PNG

                        • 9. Re: GREP: finding two identical consecutive strings
                          Larry G. Schneider Adobe Community Professional & MVP

                          Jongware, you truely need a vacation somewhere far away from computers.

                          • 10. Re: GREP: finding two identical consecutive strings
                            winterm Level 4

                            Supercalifragilisticexpialidocious...

                            • 11. Re: GREP: finding two identical consecutive strings
                              samar02 Level 1

                              The fact that one can use \1 this way is so inspiring! Now I have a follow-up question.

                               

                              I am working on dictionary files, and a typical entry is "x ... y ... z ... z ... x ...; z ... §". Each of (x|y|z) stands for a particular word, separated by any text (...), and § is a new paragraph.

                              Now I should find a string of two consecutive occurrences of the same word in the same paragraph that are not separated by a semicolon (and not separated by any other "word"). In the example above, I should only find "z ... z". How do this?

                               

                              I tried

                              (x|y|z)[^;§]+\1

                               

                              but this does not work ...

                               

                              Any ideas? Or is the \1 trick not working in this case?

                              • 12. Re: GREP: finding two identical consecutive strings
                                [Jongware] Most Valuable Participant

                                It does work, in the sense it finds the first 'x' and then anything in between until the next 'x'. Since your '...' can be any text -- just not a semicolon and not the section sign --, it eats up everything up to the next 'x'. That matches precisely what you describe:

                                 

                                samar02 wrote:

                                ...  a string of two consecutive occurrences of the same word in the same paragraph that are not separated by a semicolon (and not separated by any other "word") ..

                                 

                                Adding a '?' to make it match the shortest possible match does not add anything useful, since the entire string from the first 'x' to the next one already is "as short as possible".

                                 

                                Can you show a real world example of what you are attempting to find?