6 Replies Latest reply on Jun 3, 2013 1:21 PM by Peter Spier

    grep? find & replace after 8 characters

    steffenunger Level 1

      Hi all,

       

      sorry, i'm having trouble doing something i'm sure is quite simple.

       

      basically i have a linked file with a static caption that is reading the linked file's name, the swatches used and the creation date. it looks something like this:

       

      filename.ai:3

      white, blue, red

      5/29/13

       

      the file name is ALWAYS 8 characters and i'd like to remove the .ai:3 portion (whatever is happens to be), and keep the carriage return.

       

      with the swatches used, i only need the first color listed and am using grep , .*~b to remove everything after the comma up to the end of the line, but it seems the same expression does not work using a . (period).

       

      in the end, i was hoping to end up with this:

      filename

      white

      5/29/13

       

       

      thanks for any help. hope this makes sense!

       

      s

        • 1. Re: grep? find & replace after 8 characters
          SJRiegel Adobe Community Professional & MVP

          ^(\w{8})(\..+?)(\r)

           

          change to

          $1$3

           

          This will take every eight-letter word that starts a paragraph and is followed by a period, and remove the period and everything else up to the parapgrah return.

          • 2. Re: grep? find & replace after 8 characters
            Peter Spier Most Valuable Participant (Moderator)

            I believe \..*$ will work. It will find everything from the first period in the paragraph onward. If there is more than one period, and you only want to remove the last one and following text (if any), use \.[^.]*$

            • 3. Re: grep? find & replace after 8 characters
              steffenunger Level 1

              Far out! that works. THanks for the quick reply

              • 4. Re: grep? find & replace after 8 characters
                Peter Spier Most Valuable Participant (Moderator)

                I know you said that the lines would ALWAYS have 8 characters followed by a period and more text before the return, which means SRiegel's answer is correct, but I think mine is a bit better as it is more flexible, allowing the string before the period to be any length. I also found that (\..+?) doesn't actually work to find the shortest match if there is more than one period, but that's probably irrelevant in this case since that expression is explicitly saying you want to delete a period and everything following it up to a return after exactly 8 preceeding characters. (in other words, I think ^(\w{8}(\..+)(\r) would do exactly the same thing).

                • 5. Re: grep? find & replace after 8 characters
                  steffenunger Level 1

                  Hi Peter, thanks a lot for your input. It definitely gives me useful information in beginning to understand how grep works and starting to get a handle on what to me for now are (mildly) intimidating strings of characters that perform magic

                  • 6. Re: grep? find & replace after 8 characters
                    Peter Spier Most Valuable Participant (Moderator)

                    Here's a sort of plain-English translation of the various strings to help you.

                     

                    ^(\w{8})(\..+?)(\r) : ^ means the start of a paragraph here (it means other things in other palces, so try not to get too confused. ) The panetetical groups are marking subexpressions to allow you to re-use chunks of what is found. \w{8} means find any word character (which is pretty much anything alphanumeric) 8 times, or any sting of 8 word characters, and the ^ meant it was the first 8 characters inthe paragraph. \..+? is a literal period (adding the backslash in front of it ("escaping" it) makes it literal, followed by any srting of one or more characters (.+) and the ? is supposed to make it choose the shortest match, but it isn't 100% foolproof (didn't work for me in my testing here). the \r is a paragraph return, so the whole thing is find the first 8 characters, then everything else upt to a return, and the return. The change to expression $1$3 means use the found text in the first and third subexpressions that were marked off with the parentheses, or the first 8 characters and the return, but throw out everything in between.

                     

                    \..*$ is very similar, but I'm onl;y looking at the end of the paragraph. In this case the $ is a location marker that means the end of the paragraph, so \. is a period, followed by .* which is any character 0 or more times, up to the end of the paragraph, so it doesn't care about anythingbefor the first period, and it matches right up to the return, but doesn't include it, so if you replace leave the cahnge field blank (replace with nothing) it deletes everything from the first period to the end of the paragraph. In this case I think the * is preferable to the + because it removes the period evenif there is nothig following it at the end of the paragraph.

                     

                    \.[^.]*$ uses a "negative class" to fix the problem that the ? didn't limit the found text in testing to only the last period and what followed. Square brakets are used to define a class -- a group of characters any one of which is a match, sort of a selective wildcard -- and the ^ inside the class is a negative marker, so this class is anything that isn't a period (inside the class the . is literal and does not need to be escaped). the * again is 0 or more times, and the $ is the end of paragraph location, so it looks for a period followed by anyting that isn't aperiod at the end of the paragraph. Again, leave the change field blank.

                     

                    If you really would like to learn more about GREP, you can't go wrong buying Peter Kahrel's $10 ebook, GREP in InDesign CS3/4 from O'Reilly.