10 Replies Latest reply on Feb 23, 2012 9:12 AM by winterm

    GREP finding characters in any sequence

    samar02 Level 1

      In an ID file, I need a GREP to find characters out of four character classes appearing in any order, with each character class character occurring zero or one times. Additionally, each of the characters may or may not be followed by a period or a space.

       

      The character classes are:

       

      [A-Z]

      [a-z]

      [α-ω]

      [0-9]

       

      Examples of text strings that my GREP should find:

       

      Aβf

      B γ.9

      αC4. b

      g

      δE,a 4.

      ε 3c

      eF

      β

      4η.B

       

      When such a text string is found, the in-between spaces and periods need to disappear, and one period needs to be added at the end (if there isn't one there already), resulting in the following texts:

       

      Aβf.

      Bγ9.

      αC4b.

      g.

      δEa4.

      ε3c.

      eF.

      β.

      4ηB.

       

      I know this sounds terribly complicated, but I am sure there is an elegant solution to do this with GREP. I am thankful for any help on this.

        • 1. Re: GREP finding characters in any sequence
          Fred Goldman Level 3

          I am not a grep expert, but I don't think this would be such a simple grep.

           

          You could find the classes with spaces and periods like this:

           

          [A-zα-ω\d \.]+

           

          But I don't know of any way to refernce the spaces and periods by thmeselves in order to get rid of them in the change. Unless these are real short phrases you would need a lot of searches. A script could do this firly easily, though. Are these strings formatted at all? Do you need to keep the formatting?

          • 2. Re: GREP finding characters in any sequence
            samar02 Level 1

            Thanks for your input.

             

            The problem with the GREP you indicate is that it also will find strings consisting of just one character class (like "012", "abc" etc.).

             

            The strings I am looking for are not formatted.

            • 3. Re: GREP finding characters in any sequence
              [Jongware] Most Valuable Participant

              All of the character classes but in any order is not possible. There is no discernable system in your sample list -- I see single letters, lowercase, uppercase, single Greek letters, digits, and anything else in any random order, from one character up to four, with or without spaces and/or periods. That comes down pretty much to "any text at all, including (but not limited to) Latin, Greek, and numbers".

               

              If these strings appear like you show -- all in a list of their own --, select the entire list and remove spaces and periods on the selected text only. Then add periods at the end of each line.

              • 4. Re: GREP finding characters in any sequence
                samar02 Level 1

                Thanks, Jongware.

                 

                Unfortunately, the strings do not appear separately as shown, but they are interspersed in body text.

                 

                However, I fail to see why the list given above should boil down to "any text at all". After all, things like "ID", "nervous breakdown", or "1000 pages" clearly would not be part of the desired matches.

                • 5. Re: GREP finding characters in any sequence
                  [Jongware] Most Valuable Participant

                  Yes, to you they clearly are not good. But GREP expects simple rules. Take, for instance, in your example the lowercase single letter 'g'. You stated that any of the character classes may occur, zero or once. Well, "g" clearly is part of that. But so is "a", in the sentence "what a drag". And so is "I", as in "Well I don't know". "1000 pages" is also an example, because it contains "zero or more [0-9]", then a space, then "zero or more [a-z]" (in addition to "zero or more [A-Z]" and even "zero or more [α-ω]").

                   

                  Unless you can tighten up your rules, GREP cannot help you.

                  • 6. Re: GREP finding characters in any sequence
                    samar02 Level 1

                    The rule is zero or once, not zero or more times.

                     

                    I try to narrow it down: each string begins with a word boundary.

                     

                    Is it becoming more greppable this way?

                    • 7. Re: GREP finding characters in any sequence
                      Eugene Tyson Adobe Community Professional & MVP

                      It's impossible with GREP.

                      • 8. Re: GREP finding characters in any sequence
                        Peter Spier Most Valuable Participant (Moderator)

                        samar02 wrote:

                         

                        The rule is zero or once, not zero or more times.

                         

                        It's the zero part that causes the problem. Every word that doesn't match one of the strings you are trying to find has zero occurrences of the characters you are trying to match.

                        • 9. Re: GREP finding characters in any sequence
                          LouWrench Level 1

                          It occurs to me that you cannot globally do the changes you want because you cannot isolate the particular texts. If you did it manually how would you recognise the text you want to change. What are these words for and how do they differ from the rest of the text? You should, maybe be looking at what you don't want changed, and work backwards one style at a time. Or give the "body" text a style of its own. Is there any chance of showing us the copy you are working with?

                           

                          Lou

                          • 10. Re: GREP finding characters in any sequence
                            winterm Level 4

                            just another wild guess, after LouWrench...

                            can you tell to ID where your strings of interest begin (at the begining of the line, right?), and, especially - where it ends (where begins your "regular" body text)?

                            If yes, then maybe its possible to insert some "very special" character there just to isolate your objects for GREP, and then use something like positive lookaround?

                            sorry if it has no sense, never tried to do nothing similar, just guessing...

                             

                            huh, you should title your question: GREP finding any characters in any sequence