2 Replies Latest reply on Apr 8, 2015 7:35 AM by wideEyedPupil

    GREP:

    wideEyedPupil Level 1

      Difficulties using positive/negative look ahead and look behind tokens in GREP Find and Replace dialogue.

       

      I have a body of text which is a list of line break delimited email addresses. There are some duplicates in the list and my goal is to create a list of the duplicate emails (once each) and remove all the non-duplicate emails. The emails were sorted alphabetically prior to pasting into InDesign so all duplicates are separated only by a line break. I know I've achieved the desired whittling down to duplicates only in the past but have lost my formulas.

       

      I can select double entries of emails in my text, remove second occurrence and mark them in some way with a very simple Find token:

      Find:     (.+)\r\1

      Replace:     DOUBLE-HERE\t\1

       

      However when I attempt use a negative look ahead for matching a email address to eliminate non-duplicate emails from the list anything I try doesn't work. For instance using

      Find:     (.+)\r(?!\1)


      Picks up both this duplicate and non-duplicate emails. I suspect this may have something to do with greedy operators but I can't negotiate a GREP token that works to negatively look ahead for the email

       

      In fact the only look ahead/behind I can get to work even slightly is positive look ahead. For eg. this is attempt to do negative look ahead to match a line break delimited duplicate email gives a bit of a clue. Eg:

      Find:     (.+@)(?!.+\r\1) highlights the first part (up to and including "@") of all non-duplicate emails and on the first of the "thisone@gmail.com" texts it selects as highlighted below (in red font-face):

       

      blahblah@westnet.com.au

      foobar@gmail.com

      thisone@gmail.com

      thisone@gmail.com

      blahtyblah@live.com.au


      I feel like I'm close, I can negative look ahead for "gmail.com" for example with FIND: (.+@)(?!gmail).+ which selects all lines except those with a gmail.com domain.

        • 1. Re: GREP:
          [Jongware] Most Valuable Participant

          But you are very close!

          This small adjustment to your first GREP will select unique addresses, or only the first of a duplicate:

           

          ^(.+)\r(?!\1)

           

          where the ^ forces that the match should start at the first character of each line (yours starts from the second character if the next line is a dupe, and so it's no longer "a dupe"). If I'm correct this same fix will also work for your second attempt.

          • 2. Re: GREP:
            wideEyedPupil Level 1

            Hey, belated thank you!