5 Replies Latest reply on Jun 27, 2008 10:53 PM by sotospeak (carlo bazzo)

    regexp quirks

    sotospeak (carlo bazzo) Level 1
      I have:

      app.findGrepPreferences.findWhat = "(\\[.*\\])";

      and some textFrames with contents like this: [ABCD]

      1- My regexp string works but why do I have to put two back slash before square brackets? To me one should be enough to escape the special characters "[" and "]"

      2- If I put word boundaries like this:
      app.findGrepPreferences.findWhat = "\b(\\[.*\\])\b";

      I do not find any occurrance, but in Python it works.

      4- In one frame I have something like this:

      "This is a test (..some other paragraphs..) string with [ABCD] in it"

      This string matches with my "(\\[.*\\])" but when I try:

      var myFoundItems = app.documents.item(0).findGrep();
      var myword=myFoundItems[0]

      I got myword="This".

      Any ideas? Thank you
      carlo
        • 1. Re: regexp quirks
          Peter Kahrel Adobe Community Professional & MVP
          Carlo,

          First of all, you need to adjust your grep expression a bit: "(\\[.*\\])" finds everything from the first bracket in a document to the last one, which is not what you want (presumably). Instead, use "(\\[.*?\\])" to tame the expression's appetite. You won't see the difference in a document that contains just one instance of a word in brackets, but when you enter a few more you will notice the difference.

          1. The backslash is a special character itself, therefore it needs to be escaped when used in a string. This is standard. One backslash may be enough for you, for ID/JS it isn't.

          2. You need to escape the backslashes and use \\b, not \b ;) but even then it won't work. The difference in behaviour between InDesign and Python may be due to a difference in interpretation of the notions "word" and "word character". But ID's behaviour makes sense to me: aren't [ and ] word delimiters themselves?

          4. This is strange. Your code works fine for me and gives the correct result. Apart from that, your "myword" should give you an object of type Word or Text or something like that. myword.contents would give you a string.

          Peter
          • 2. Re: regexp quirks
            Level 1
            Hi Carlo,

            Actually, if you're creating the XML element at the same time, you don't need to use markup()--you can just add the text object as a parameter to xmlElements.add(). Here's an example:

            //MarkupTextWithGrep.jsx
            
            //An InDesign CS3 JavaScript
            //
            main();
            function main(){
            mySetup();
            mySnippet();
            }
            function mySetup(){
            var myDocument = app.documents.add();
            var myTextFrame = myDocument.pages.item(0).textFrames.add({geometricBounds:myGetBounds(myDocument, myDocument.pages.item(0))});
            myTextFrame.parentStory.contents = "This is a [textTagA] in an XML story.\rThis is the second [textTagA] in the story.\rThis is the first [textTagB] in a text frame.\rThis is the second [textTagB] in the story.\rThis is the third [textTagB] the story.\r";
            }
            function mySnippet(){
            var myFoundItem, myText, myString, myError;
            var myDocument = app.documents.item(0);
            var myXMLTag = myDocument.xmlTags.add("Story");
            var myStoryElement = myDocument.xmlElements.item(0).xmlElements.add(myXMLTag)
            myStoryElement.markup(myDocument.stories.item(0));
            app.findGrepPreferences = NothingEnum.nothing;
            app.findGrepPreferences.findWhat = "(\\[)(.+?)(\\])";
            //Get the references (if any) in reverse order to avoid invalid references.
            var myFoundItems = myDocument.findGrep(true);
            if(myFoundItems.length != 0){
              for(var myCounter = 0; myCounter < myFoundItems.length; myCounter++){
               myFoundItem = myFoundItems[myCounter];
               myText = myFoundItem.texts.itemByRange(myFoundItem.characters.item(1),myFoundItem.characters.item(-2));
               myString = myText.contents.toString();
               //Create the xml tag if it does not already exist.
               try{
                myXMLTag = myDocument.xmlTags.item(myString);
                myXMLTag.name;
               }
               catch (myError){
                myXMLTag = myDocument.xmlTags.add(myString);
               }
               //Create an XML element inside the story element.
               myXMLElement = myStoryElement.xmlElements.add(myXMLTag, myText);
              }
            }
            app.findGrepPreferences = NothingEnum.nothing;
            }

            function myGetBounds(myDocument, myPage){
            var myPageWidth = myDocument.documentPreferences.pageWidth;
            var myPageHeight = myDocument.documentPreferences.pageHeight
            if(myPage.side == PageSideOptions.leftHand){
              var myX2 = myPage.marginPreferences.left;
              var myX1 = myPage.marginPreferences.right;
            }
            else{
              var myX1 = myPage.marginPreferences.left;
              var myX2 = myPage.marginPreferences.right;
            }
            var myY1 = myPage.marginPreferences.top;
            var myX2 = myPageWidth - myX2;
            var myY2 = myPageHeight - myPage.marginPreferences.bottom;
            return [myY1, myX1, myY2, myX2];
            }

            Thanks,

            Ole
            • 3. Re: regexp quirks
              Peter Kahrel Adobe Community Professional & MVP
              Ole,

              Shouldn't this be in another thread? See http://www.adobeforums.com/webx/.3bbf275d.59b5a77f/1

              Peter
              • 4. Re: regexp quirks
                Level 1
                Hi Peter,

                You're probably right.

                Thanks,

                Ole
                • 5. Re: regexp quirks
                  sotospeak (carlo bazzo) Level 1
                  Thank you Peter.

                  For 1) you are right: in Python one back slash is enough to escape but in JS you must have two. Mine was a stupid question actually because even in Python we must put two back slashes if you do not use raw strings.

                  2- I tried also \\b and as you said it did not work. Though this, as you explained, may be coherent with the Indesign concept of word and characters, I nonetheless find puzzling that my single word [ABCD], put alone in a textframe, does not match word boundaries. I suppose at least this should be warned when in the JS scripting guide they talk about word boundaries as \b.

                  4- I will investigate better into this.

                  Thanks again for your time
                  carlo