6 Replies Latest reply: Mar 9, 2013 9:19 AM by samar02 RSS

    GREP: using positive lookbehind/positive lookahead


      I fail to find the correct GREP for a seemingly very easy query: within an "abc" string, find "b" (without "a" or "c"). As simple as this.


      Here are the details: I have a text with many occurrences of this sequence:


      ~hl xxx

      ~he xxx

      ~hd xxx

      ~hf xxx

      ~hb xxx


      That is, five consecutive paragraphs, each beginning with a code consisting of a tilde (~) followed by h[ledfb] and a space, and then any characters follow (here represented by xxx). The same codes appear in other sequences as well.


      Now in such a sequence (and only in such), the paragraph beginning with ~hf should become the second, so that the sequence is changed to:


      ~hl xxx

      ~hf xxx

      ~he xxx

      ~hd xxx

      ~hb xxx


      I am planning to use a combination of Positive Lookbehind / Positive Lookahead search. This should find each paragraph beginning with "~hf ", only if it occurs after the three paragraphs and before the one paragraph mentioned above. I could then copy the match to the clipboard and move it to the correct place. So I was trying to use this grep:


      (?<=hl [^\~]+\r\~he [^\~]+\r\~hd [^\~]+\r)\~hf [^\~]+\r(?=\~hb[^\~]+\r)


      The [^\~]+ bits make sure no other code is being matched.


      For some reason this does not match anything. Why? (If I omit the lookbehind and lookahead bits, it works.)


      Any help greatly appreciated!


      Message was edited by: samar02

        • 1. Re: GREP: using positive lookbehind/positive lookahead
          Peter Spier ACP/MVPs

          Let me start by saying what I know about GREP I learned here from Jongware and Peter Kahrel, and from Peter's book, which I highly recommend.


          Eiher one of them may come along with a better solution as soon as I'm done, but in the meantime I think the lookarounds don't work because the lengths could be variable. I do have an alternate plan that DOES seem to work, however, and eliminates the need to use the clipboard as a bonus.


          Use the search expression

          (\~hl [^\~]+)(\~he [^\~]+)(\~hd [^\~]+)(\~hf [^\~]+)(\~hb [^\~]+)

          and then use a sequence of $1 through $5 to rearrange the order, for example $1$2$5$3$4 will move the ~hb paragraph after the ~he and before the ~hd.


          I thought you would need (?s), which is the "single line" marker that causes the entire story to be treated as a single paragraph at the beginning of the query, but that doesn't seem to be the case. Also, spaces and returns are bicked up by the [^\~] (not a tilde) class, so you don't need to explicitily include them unless you want to be sure that they are in precise positions, like the space after the ~hl. I left those in, but there is no point in leaving in the the \r as it cannot be used with the negative class to stop the match unless the return itself is included in the class. This leaves the possibility that you could have intervening paragraphs that don't start with a tilde, and they would be considerd, as far as a match, as belonging with whatever paragraph starting with a tilde comes before them.


          If you want the query to not match any paragraph that doesn't start with a tilde (in other words it should fail if there is an intervening pargargraph in the list without the prefix), I think you could modify it as follows:


          (\~hl [^\~\r]+\r)(\~he [^\~\r]+\r)(\~hd [^\~\r]+\r)(\~hf [^\~\r]+\r)(\~hb [^\~\r]+\r)

          which also will not match if your ~hb paragraph is the last in the story and does not have a trailing return. In the first query it is found, which can lead to combining it with another paragraph (I know this because it happened in my test).

          • 2. Re: GREP: using positive lookbehind/positive lookahead
            Peter Spier ACP/MVPs

            I've been thinking about the [^\~\r] that I used, and although it SEEMS to work, I don't understand why it would. Logically a tilde is not a return, and a return is not a tilde and I would think this would go into a meltdown loop and hang or crash, or match everything, essentially becoming the equivalent of .+. Would one of you experts out there care to comment on the inner workings of a class? It seems like the negation includes an implied boolean "and" rather than the implied "or" of a regular class.


            Or is just an accident that it worked?

            • 3. Re: GREP: using positive lookbehind/positive lookahead
              Peter Spier ACP/MVPs

              I just checked, and substituting .+? for [^\~\r] also works (or seems to), the ? being used to limit the + to shortest match. I understand from Peter's book, though, that the negative class is faster.

              • 4. Re: GREP: using positive lookbehind/positive lookahead
                samar02 Community Member

                Thank you, Peter.


                Your search expression works fine. But then a new problem arises: many of the characters ("xxx") after the codes are styled with a character style, and using the $1... method (to replace the matches) causes these character styles to be applied to the wrong characters, resulting in mayhem. So it seems to me to be much safer using a method of finding the single paragraph to be replaced, moving it to the clipboard, and re-inserting it at its new place. This way the character styles remain untouched.


                I have now read somewhere that a regex can be used inside lookahead, but not inside lookbehind. And indeed, if I leave out the lookbehind bit, the search works! But I still need the equivalent of a regex within a lookahead ... I there a workaround?

                • 5. Re: GREP: using positive lookbehind/positive lookahead
                  Peter Spier ACP/MVPs

                  Indeed, styles are a problem in this sort of movement, and beyond my current abilities to work around. Can you use Nested or GREP styles ( or is that what you are doing?) to apply the character styles?

                  • 6. Re: GREP: using positive lookbehind/positive lookahead
                    samar02 Community Member

                    I have tried it – but it's much too complex (including R-L languages, different typefaces, sizes etc.).


                    I have now found a workaround with using the regex inside the lookahead only. The downside is that I get about ten percent too many hits, but this is still better than risking chaotic styling.


                    Thanks again for your thoughts.