3 Replies Latest reply on Jan 5, 2016 11:29 PM by Laurens V

    Subscript basic chemical formulas

    Laurens V

      Hi

       

      I work on a biweekly agricultural magazine in which basic chemical formulas (CO2, CH4, N2O, ...) regularly appear, so naturally I want to automate the subscript part of these formulas with a GREP style in my paragraph style.

      I have found a very robust GREP expression (I only use the 'Down' part) to find the numbers and change them to subscript (Credit to Vasco Elbrecht).

      (?<=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)])\d{1,3}(?=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)|( )|(\()])

      Now the problem is that in my case the formulas are used in sentences, so they can be followed immediately by a period, a comma, closing bracket and some other punctuation marks. This has not been implemented in Elbrecht's script so I tried to modify it (with my minimal GREP knowledge) to my needs, which resulted in this:

      (?<=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)])\d{1,3}(?=[(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)|( )|\)])

      But now when I just type some numbers (less than 4) between brackets, they will be put in subscript too, which of course is not necessary...

      Is there any way this can be solved? Or should I just keep changing the numbers manually?

       

      Cheers,

      Laurens

       

      Bonus:  tried to write a little grep script for mm, cm, m and km squared or cubed superscript (only if preceded by a number and a space), but I could not figure out how to put them all in one script, so now I have these two GREP styles:

      (?<=\d m)(2|3)

      (?<=\d (c|m|k)m)(2|3)

      Is there any way to combine these two in one script?

        • 1. Re: Subscript basic chemical formulas
          [Jongware] Most Valuable Participant

          Are you sure you copied the expression correctly? You get everything between parentheses subscripted because both entire left ("lookbehind") and right ("lookahead") expression contain lots of unique chemical elements, separated by | ('OR') but also within square brackets ([..]). This notation is only and exclusively used for a Single Character Set. So the entire expression

           

          [(Na)|(Cl)|(H)|(C)|(O)|(S)|(N)]

           

          actually checks for one single occurrence of one the characters '(' 'N' 'a' ')' '|' 'C' 'l' 'H' .. and so on. Note this includes the single parentheses, before and after. You can also see this when you type "a2a" – the '2' will get subscripted.

           

          You should remove the square brackets, but then you end up with a lookbehind with elements of variable length (one part is "Na", another is "H") which is not supported by InDesign. To fix that, you need to split up the lookbehind into two parts: one that looks for 2 characters OR one that looks for 1 character.

           

          Due to the nature of chemical notation, I don't think you need the lookahead at all! The following regex will match the test compounds you mention:

           

          ((?<=Na|Cl)|(?<=H|C|O|S|N))\d{1,3}(?!\d)

           

          and it will refuse to fire when one of these letters are followed by more than 3 digits – that's what the negative lookahead is for.

           

          Bonus challenge

           

          It's tempting to devise a regex "(?<=\d [cmk]?m)[23]\b" (which checks for 'm' with an optionally 'c', 'm', or 'k' prefix) but, again, it will not work because of the variable length. In this case it's the question mark ('?') that causes the length to be variable: after all, it means "either zero or once". But you can duplicate the lookbehinds again, and end up with a regex which is longer and more cumbersome, but at least works:

           

          ((?<=\d [cmk]m)|(?<=\d m))[23]\b

          • 2. Re: Subscript basic chemical formulas
            Obi-wan Kenobi Adobe Community Professional

            Don't forget the Ununtrium! 

            • 3. Re: Subscript basic chemical formulas
              Laurens V Level 1

              Sorry for the late reply, but that is exactly right!

              Many thanks for the explanations as well, since it clears up why the code I was using didn't work. Especially about the variable lengths, I did not know that... And the first code you wrote with the bonus was the one I tried too . Because it seemed so logical.

              With this newfound knowledge I'll be able to tackle other problems that might come up.

               

              Thanks again Jongware!