3 Replies Latest reply on Apr 10, 2010 11:03 AM by [Jongware]

    regular expression question

    jackhenrie Level 1

      I have a very simple but somewhat unusual task.

      I need to break a large set of texts into lines that contain exactly 6 words each.

      That is, I need to a paragraph return after every 6th word in a large unbroken text.

       

      My GREP find and replace attempts have been unsuccessful.

       

      In the find field, I am trying to specify a word boundary (or white space) with any character(s) following exactly six times.

       

      In the replace field, I would add a paragraph return after the 6th space.

       

      Can anyone help me here?

       

      Thanks!

        • 1. Re: regular expression question
          [Jongware] Most Valuable Participant

          This expression treats 'anything that's not a space' as a single word, and will replace any white space immediately after 6 of them with a hard return.

          Search:

           

           

          ((\S+\s){5}\S+)\s
          Replace with:
          $1\r
          As a Return in itself is also a 'kind of white space', it will step over returns as if that was a space, continuing counting up to 6 words. If that's not what ought to happen, use the next expression, which will only insert returns inside 6 (or more) space-separated words:
          ((\S+\s(?!\r)){5}\S+)\s

           

          (with the same Replace With).

           

          [Post post-thought] By the way, it assumes there is always just a single space between those words. I think anything with more than one space (or a space followed by a return, etc.) simply gets skipped, being 'invalid'.

          • 2. Re: regular expression question
            jackhenrie Level 1

            thanks! awesome, works perfectly.

             

            one question:

             

            why is the find code

             

            ((\S+\s(?!\r)){5}\S+)\s

            instead of

            ((\S+\s(?!\r)){6}

            I.E. why is the final white space tacked on at the end of the code?

             

            thanks again

            • 3. Re: regular expression question
              [Jongware] Most Valuable Participant

              Good question! (It shows your GREP confidence is growing.)

               

              I did this so the final white space character would get replaced by a hard return. The stuff inside the parentheses will get copied with "$1", but that last space is outside the parens, so it will be deleted, and then a hard return will be inserted.

               

              If you use your GREP, you cannot remove that final space with any GREP expression in the Replace field. And just leaving that white space before a hard return ... I think that's messy. I always remove all 'loose' whitespace before hard returns when I start on a new document.

               

              By the way, your GREP is almost exactly what I tried first. The major difference is ... yours will add a hard return after the 6th word (and its associated space), but remove the five words before it as well!

              That's because of your parentheses:

               

              (¹(\S+\s(?!\r))¹{6}

               

              The "found" stuff will be placed "in" Found Text #1 six times, each time overwriting the previous value. The {6} ought to be inside a set of parentheses, all around what you have now:

               

              (¹((²\S+\s(?!\r))²{6})¹

              1 person found this helpful