6 Replies Latest reply on Jun 14, 2011 7:17 PM by i.am.pixelated

    GREPs to apply "No Language" to symbols within words?

    i.am.pixelated

      We use a lot of symbols in our documentation and I have created a custom user dictionary in InDesign (CS5, Mac OSX, 10.6.7) with about 5000 custom word and number combinations. This worked perfectly in CS2, but not in our CS5 upgrade. Spell Check regularly hangs InDesign --- usually with our symbol-laden terms (Greek, mathematical, etc.) and also does not like the user to select “Ignore All” too many times.

      I’m a GREP newbie, and after much trial and error, I came up with the following GREPs which work, but I think it can be done more efficiently. Currently these catch all words with specific symbols and treats them as "NO LANGUAGE" -- that way Spell Check skips over them. It would almost be more ideal if Spell Check could simply skip the symbol inside of the term, but check the rest of the term. For example, in the text "CLEAR_HITSORY" in would be cool if Spell Check would ignore only the "_" and spell check "CLEAR" and "HITSORY" as two complete words and then offer "
      HISTORY" as a suggestion for HITSORY." However, all these GREPs seem to use a lot of memory and now the program is very sluggish. By the way, the “RT, RTC, RTM” followed by numbers covers all our part numbers:

      (.+µ+|.+Ω+|.+∆+|.+θ+|.+_+|.+RTC\d+|.+RTM\d+|.+RT\d+)
       
      (.+
      τ+|.+ƒ+|.+π+|.+ρ+|.+∑+|.+∫+|.+∞+|.+ß+|.+÷+|.++|.+δ+|.+η+|.+≈+|.+~+|.+α+|.+ε+|.+κ+)
       
      ^(RTC\d+)|(RT\d+)|(RTM\d+)

      I read that
      \D\U\L would find everything that is NOT a digit, NOT a capital letter and NOT a lower case letter. This made me think that it would choose all symbols (Greek, numerical, –, +, &, $, °, • ... ) to which I would apply my character style NO LANGUAGE. I cannot seem to get any combination of this to work. I’m not an engineer, I’m an artist, and I have really been using up all my brain cells.

      Maybe this idea would be problematic because it would also include periods, commas, semi-commas, all the “normal” punctuation.  Hmmmm ... Any suggestions on what code might work for this scenario, or even just a way to make the code I've written more concise and elegant?

      Many Thanks!!!

        • 1. Re: GREPs to apply "No Language" to symbols within words?
          [Jongware] Most Valuable Participant

          Would this work?

           

          \W

           

          It's the shortcut code for "not a word character", and, fascinatingly, a "word character" is defined as one of \d, \u, or \l -- precisely the set you tried with earlier. Without seeing your code, it was probably some bad notation, but as you say, this \W seems to define exactly what you asked.

          • 2. Re: GREPs to apply "No Language" to symbols within words?
            John Hawkinson Level 5
            I read that \D\U\L would find everything that is NOT a digit, NOT a capital letter and NOT a lower case letter. This made me think that it would choose all symbols (Greek, numerical, –, +, &, $, °, • ... ) to which I would apply my character style NO LANGUAGE. I cannot seem to get any combination of this to work. I’m not an engineer, I’m an artist, and I have really been using up all my brain cells.

            \D\U\L would find every instance of a single non-digit followed by a non-capital followed by a non-lowercase letter. If you would like to match a character that matches a set of criteria, you must enclose your criteria in square brackets, to form a character class. For instance, while ab matches a followed by b, [ab] matches a or b.

             

            This doesn't quite help: [\D\U\L] matches any non-digit OR any non-capital OR any non-lowercase.

             

            But you can invert a class with a leading ^ within the class. [^\D\U\L] matches anything that is NOT (any non-digit OR any non-capital Or any non-lowercase). Which is the opposite of what you want. So you probably want [^\d\u\l] which matches anything that is NOT a digit, a capital, or lowercase.

             

            Whew!

            1 person found this helpful
            • 3. Re: GREPs to apply "No Language" to symbols within words?
              Mary Posner Level 3

              Huh, well, I got part of this to work, using some of John's suggestions. The no-uppercase/no-lowercase option didn't work for me on my sample text:

               

              CLEAR_HITSORY

              MoreπPie

               

              Apparently it considers the "pi" character, for one, to be either an upper or lowercase letter. But I tried the below to limit it to a specific range of upper and lowercase letters:

               

              [^a-z, ^A-Z]

               

              ... and that worked -- somewhat. It did apply the "no language" style to an underscore, the paragraph return and the pi symbol. You'd probably want to have this exclude white spaces and punctuation as well as the alpha characters.

               

              However, after "no language" was applied to the underscore, it considered the text on both sides of it to be correctly spelled. It only tagged "HITSORY" as misspelled if I added a space directly before it. It seems that's just the way InDesign's spelling works. If a word has one or more characters set to another language (or no language), it will not flag the word as a spelling error.

              If you find a workaround that works, I'd love to hear it!
              Mary

              1 person found this helpful
              • 4. Re: GREPs to apply "No Language" to symbols within words?
                John Hawkinson Level 5
                [^a-z, ^A-Z]

                 

                This probably does more than you intended.

                It reads as "All characters that are NOT: (1) between a-z (2) comma (3) space (4) caret (5) between A-Z." So for instance, your expression will not match a "^" when you probably intend it to. And it will match tabs but will not match spaces.

                 

                You probably wanted [^a-zA-Z].

                • 5. Re: GREPs to apply "No Language" to symbols within words?
                  Mary Posner Level 3

                  Thanks, John. I'm sure you're correct!

                  • 6. Re: GREPs to apply "No Language" to symbols within words?
                    i.am.pixelated Level 1

                    John, Mary, Jongware,  --- Wow. Thanks so much. The [^a-zA-Z] is working so far. I've tried it on about 5 documents. SO MUCH more comprehensive and elegant than what I was doing. Very grateful.