2 Replies Latest reply on Feb 17, 2011 5:06 PM by dsackett

    [JS - CS4] Help w/ JS choking on quotes in strings & regex

    dsackett Level 1

      Does anyone have an easy way to extract text from a string with quotes in JavaScript? I seem to have run into a few limitations in the implementation of javascript's regex and how a script handles a string with quotes in it.

       

      Here's what I'm working with: a regular string like:
      {appliedParagraphStyle:"Headline 6", changeConditionsMode:1919250519}
      from which I'd like to extract just the text 'Headline 6' without any quotes

       

      I came across a regex that would get just the text without the quotes, but apparently Javascript does not support lookbacks. For instance, Jongware's wonderful WhatTheGrep script can decode this regex:
      (?<=")[^"]*?(?=")
      but ID finds nothing with it when run in a GREP search. The same search without the lookback finds stuff:
      "[^"]*?(?=")
      but what it finds still has the quote at the start of the string.

       

      I thought I could just use substrings to extract the text without the quote, but apparently this has issues. See these examples from the Javascript console:

       

      testStr;
      Result: "Headline 6"
      testStr.length;
      Result: 1

       

      testStr;
      Result: Headline 6
      testHL.length;
      Result: 10

       

      testHL;
      Result: "Headline 6
      testHL.length;
      Result: 1

       

      So if the string has a quote mark in it, it doesn't return the true length, so I can't code a start and end of the string to extract the part without the quote.

       

      Do any of you more experienced folks have some magic insight? TIA!

        • 1. Re: [JS - CS4] Help w/ JS choking on quotes in strings & regex
          [Jongware]-9BC6tI Level 4

          It's an InDesign Oddity.

           

          Double quotes inside GREP strings behave a bit strange. Usually, the " character will find any sort of double quote -- straight, curly open, and curly closed. But inside a GREP lookbehind string this suddenly fails (in a lookahead it seems to work fine).

           

          If you replace it with its "forced" variant ~{ it will work on open curly quotes:

           

          (?<=~{)[^"]*?(?=")

           

          and if you need it to work on the 'any kind of double quotes' you can even use this:

           

           

          (?<=["])[^"]*?(?=")

           

          But ... all of the above only applies to searching in InDesign! And that's what WhatTheGrep reports upon.

           

          GREP-inside-Javascript has nothing to do with InDesign's implementation -- it's part of Javascript itself, and it is a completely different thing. (Well, it's still GREP of course. But there is no such thing as "It's still GREP" -- there are lots of different implementations.)

          So it's perfectly possible that this expression works in InDesign but does not work inside Javascript, when applied to Javascript strings. It seems the lookbehind isn't working at all (as it seems you alread found out, uh, on 2nd reading...).

           

          So you have to think of something else. How about this one? Using parentheses in a find expression can be useful!

           

          var str = "Hello world!";
          var f = str.match (/"([^"]+)(?=")/);
          if (f)
           alert ("Result: "+f.join("\r"));
          else
           alert ("Nothing matches...");
          
          str = "Hello \"world\"!";
          f = str.match (/"([^"]+)(?=")/);
          if (f)
           alert ("Result: "+f.join("\r"));
          else
           alert ("Nothing matches...");
          
          
          1 person found this helpful
          • 2. Re: [JS - CS4] Help w/ JS choking on quotes in strings & regex
            dsackett Level 1

            Thanks, Jongware, your answer had keys to a workaround--something I came across when examining your code.

             

            Being new to scripting, I had indeed crashed on the assumption that regular expressions worked the same in GREP searches and JS. ID can do lookbacks but JS can't--check out www.regular-expressions.info/refflavors.html for a comparison of regex flavors; JS is listed as ECMA in the table.

             

            In case you or anyone wants to know, my problem concerned converting styles a Word doc that may or may not have styles the script expects. ID balks at searching for a style that's not part of the document, so I had to trap for that. Rather than extracting the style text from the string and comparing it to a list of document styles, I made a regexp of the style list and use that to match against the string, avoiding having to bring quotes into it at all.

             

            var docRef = app.activeDocument;
            var myStyles, s;
            for (var s = docRef.paragraphStyles.length-1; s >= 2; s--) {
                      //get a list of styles, but not the default styles in brackets, since they'll screw up a RegExp pattern
                        myStyles = docRef.paragraphStyles[s].name + "|" + myStyles;
            }

            myRE = new RegExp(myStyles);

             

            str = "{findWhat:\"n\", appliedParagraphStyle:\"Heading 1\", appliedFont:\"Wingdings\", fontStyle:\"Bold\"}";
            f = str.match (myRE);
            if (f)

                 alert ("A search for: "+f+" will work");
            else
                 alert ("Do not attempt a search...");

            Thanks again for your kind response, it was just the added perspective I needed!