14 Replies Latest reply on Jun 27, 2011 12:16 PM by vze26m98

    XMLRules and "the selection"?

    vze26m98 Level 1

      Hi all-

       

      I'm impressed by the speed of XMLRules over an iteration through the XML tree, but I wonder if it's possible to operate on a user-selection of XML elements with XMLRules? I'm guessing it's not, because a user selection wouldn't necessarily form a tree for an XMLRule to move through.

       

      But is there any way to determine whether a given XML element is part of the current user selection? This information shows up as an underlined element in the Structure window, but I can't seem to find a parallel property in the InDesign object model.

       

      Related to this, but a more general question, is whether the start element for the XMLRules processor can be something other than the root element? All of the published examples start at the root, but is it possible to pass an arbitrary element as the root for use by the processor?

       

      Many thanks!

       

      Charles Turner

        • 1. Re: XMLRules and "the selection"?
          John Hawkinson Level 5

          Hi, Charles.

           

            Dealing with XML is a pain, and dealing with XML in InDesign even more so, and dealing with XML Rules in InDesign's scripting architecture is like the trifecta. I would strongly encourage you to find another way to do what you want, it will likely be easier to write and an order of magnitude easier to maintain.

           

          Can you tell us about the problem you're trying to solve?

           

          In any case, though, if you're dead set on using XML Rules, please take a look at my example from Re: How to shift content with in cell in xml rules table (reply #20). I think it makes it much much easier to use XML Rules in Javascript without all the annoying pain and ugly syntax.

           

          I'm impressed by the speed of XMLRules over an iteration through the XML tree, but I wonder if it's possible to operate on a user-selection of XML elements with XMLRules? I'm guessing it's not, because a user selection wouldn't necessarily form a tree for an XMLRule to move through.

          Note that typically one traverses a tree rather than iterating over it.

           

          Anyhow, It seems like the wrong approach -- why not iterate over the list of user-selected subtrees and traverse it with __processRuleSet ? Or you could move them into a tree of their own if you like (that might have other consequences). Lastly you could traverse the root and check each node to see if it is in your list.

           

          But is there any way to determine whether a given XML element is part of the current user selection? This information shows up as an underlined element in the Structure window, but I can't seem to find a parallel property in the InDesign object model.

          I do not believe you have access to what is selected in the Structure pane from the scripting DOM. Sorry.

          You can, of course, find the associatedXMLElement of a selected object on the page, though.

           

          Why do you want to do this?

          Related to this, but a more general question, is whether the start element for the XMLRules processor can be something other than the root element? All of the published examples start at the root, but is it possible to pass an arbitrary element as the root for use by the processor?

          Of course! But why are you asking rather than testing? It should be the work of a moment to test, and then you'd know for certain!

          • 2. Re: XMLRules and "the selection"?
            Andreas Jansson Level 2

            Hi John,

             

            You often react on words, being used improperly. In order for us (me) to learn, please define your view on the subject better and explain the errors, such as the difference between traversing and iterating.

             

            Can not iteration be recursive, and thus a normal way to traverse a tree (i.e. the exact same thing as traversing)?

             

            Thanks,

            Andreas

            • 3. Re: XMLRules and "the selection"?
              vze26m98 Level 1

              Well, John's correct. Iteration has the general sense of "doing things one after the other," but computer science makes a distinction between between iteration and recursion, or perhaps imperative and declarative programming styles. Tree traversal is the appropriate word here, which typically means a recursive approach, but "iterative traversal" is also possible.

               

              The Wikipedia can provide quick answers to the above.

               

              Best, Charles

              • 4. Re: XMLRules and "the selection"?
                vze26m98 Level 1

                Thanks for your response, John-

                John Hawkinson wrote:

                 

                Can you tell us about the problem you're trying to solve?

                 

                Well, XML is the means of communication between InDesign and a FileMaker Pro database. The InDesign document is a presentation of potentially thousands of database records, each with about twelve fields. So I don't have the freedom (for now, at least) to ponder the task without the use of XML.

                 

                The InDesign document consists then, of potentially thousands of TextFrames, each with twelve elements of tagged data. So the tree is shallow and wide. Let's say I want to find out which elements within a particular TextFrame contain identical data. I can iterate through those XML elements, or use InDesign's XMLRules to traverse the tree. My tests suggest a big speed difference: with 20 items, XMLRules are twice as fast; with ~3000 items, the iterative solution takes 72 seconds, while the XMLRule approach takes 1.7 seconds.

                Anyhow, It seems like the wrong approach -- why not iterate over the list of user-selected subtrees and traverse it with __processRuleSet ?... Lastly you could traverse the root and check each node to see if it is in your list.

                 

                I do not believe you have access to what is selected in the Structure pane from the scripting DOM.

                 

                I'm a little unclear what you're suggesting here. You say there's no access to items selected in the Structure pane, but you've suggested I could iterate over the user selection. In my case, the user has selected a bunch of TextFrames, and I know I can get the associatedXMLElement(s), but as my test above suggests, iteration alone can be quite slow. I had thought that if it were possible to know whether an element was selected during a traversal by XMLRules, I might have a solution, but that doesn't seem possible.

                 

                Best wishes, Charles

                • 5. Re: XMLRules and "the selection"?
                  John Hawkinson Level 5
                  You often react on words, being used improperly. In order for us (me) to learn, please define your view on the subject better and explain the errors, such as the difference between traversing and iterating.

                  You're right, of course. Words are important to me, and I probably am more concerned about semantics than the average person, and that is an excellent start on the road to Trouble.

                  But that's not what this thread is about, and really not what my reply was about -- that was just a small aside ("Note that...") that was mostly beside the point of the post.

                   

                  Though ironically, I went on to say one should iterate over the list of subtrees and then traverse it, but of course I meant to say iterate over the list of subtrees and then traverse them. Ha!

                   

                  I think Charles' answer is correct, but I'm much more interested in hearing the problem Charles is trying to solve with XML rulesets than discussing he proper language for tree descent. Charles?

                  • 6. Re: XMLRules and "the selection"?
                    John Hawkinson Level 5

                    Sorry, my previous reply crossed with yours. Thanks for providing context!

                     

                    The InDesign document consists then, of potentially thousands of TextFrames, each with twelve elements of tagged data. So the tree is shallow and wide. Let's say I want to find out which elements within a particular TextFrame contain identical data. I can iterate through those XML elements, or use InDesign's XMLRules to traverse the tree. My tests suggest a big speed difference: with 20 items, XMLRules are twice as fast; with ~3000 items, the iterative solution takes 72 seconds, while the XMLRule approach takes 1.7 seconds.

                    Wow, that is surprisingly bad!

                     

                    I assume there isn't something obviously wrong with your traversal. I guess the bottleneck is typically when the scripting engine makes calls to the InDesign

                    object model to get data, and if it has to do so for every node, it can be very slow.

                     

                    The usual answer here is to preprocess your XML before you import it into InDesign. I am not an XSLT expert, but I suspect you could use <xsl:for-each-group> in XSLT 2.0 to suppress your duplicates. Unfortunately it may make your head explode, and XSLT2 implementations don't exactly grow on trees.

                     

                    I think the next answer is to preprocess the XML in InDesign's Javascript implementation of XML (E4X) before importing it into the InDesign DOM.

                    InDesign has two XML implementations, one is solely inside the js interpreter (E4X), the other is tied to the DOM. As you've seen, the DOM can be really slow. E4X should not suffer from that problem. So  you would then write out a new temporary XML data file and then import that into InDesign.

                     

                    My brain is failing to find a good E4X example right now, but let me know if you need me to dig one up.

                     

                    I'm a little unclear what you're suggesting here. You say there's no access to items selected in the Structure pane, but you've suggested I could iterate over the user selection. In my case, the user has selected a bunch of TextFrames, and I know I can get the associatedXMLElement(s), but as my test above suggests, iteration alone can be quite slow. I had thought that if it were possible to know whether an element was selected during a traversal by XMLRules, I might have a solution, but that doesn't seem possible.

                    I was a bit unclear because I wasn't sure of the details of your requirement.

                     

                    One common solution to slowness inthe DOM is to retreieve an entire list at once, rather than, say, iterating over the list and retrieving elements one at a time. E.g.:

                     

                    var i,t = app.activeDocument.textFrames.everyItem().getElements(); // array
                    for (i=0; i<t.length, i++) { $.writeln(t[i].contents); }
                    
                    

                    and not

                     

                    var i,t = app.activeDocument.textFrames; // collection
                    for (i=0; i<t.length, i++) { $.writeln(t[i].contents); }
                    

                     


                    I'm not sure if there is a good way to do this with XML. In fact, ironically, it might mean you have to iterate over the top level items in the tree, i.e. create an array with root.xmlElements.everyItem().getElements() and then process that. You should probably try that before we talk about reworking your design too much...

                     


                    Anyhow, is it very slow to simply iterate over app.selection and put each associatedXMLElement into an array before you start your rule processor? Then you'll know what is selected...

                     

                    Hmm.

                    • 7. Re: XMLRules and "the selection"?
                      vze26m98 Level 1

                      Thanks again, John for your response-

                       

                      (BTW: I'm very grateful for the JSLint preamble in the code you linked to above. I'm pretty much a Javascript newbie (having come from C and Python) and really like what Crockford has said about the language.)

                       

                      Here's a bit more context that I should have put in the previous post: What I'm trying to find are similar elements, and display that as context for the user. This procedure has the disadvantage of not being the goal of the task, but simply an aid. So taking a lot of time with it isn't very tolerable. Also, although a user might typically select only 10-20 TextFrames, there's nothing keeping them from selecting a whole lot more. The ~3000 example is used spanned only 10 InDesign pages, so there were about 300 TextFrames per page. Still pretty slow...

                       

                      Here's my benchmark code. I think it's "correct," although as you say, I could grab bigger chunks:

                       

                       

                      /*jslint undef: false, white: false */
                      
                      #include 'glue code.jsx';
                      
                      var documentFilePath = app.activeDocument.filePath.fsName;
                      var theDocument = app.documents[app.activeDocument.index];
                      var items = theDocument.xmlElements[0].xmlElements[4].xmlElements.length;
                      var dummy;
                      
                      function GetMessage () {
                        this.name = 'GetMessage';
                        this.xpath = '/FMPXMLRESULT/RESULTSET/ROW/COL[5]/DATA';
                        this.apply = function (element, ruleproc) {
                          // $.writeln(element.contents);
                          dummy = element.contents;
                          return true;
                        };
                      }
                      
                      $.hiresTimer;
                      for (var i = 0; i < items; i++) {
                        dummy = theDocument.xmlElements[0].xmlElements[4].xmlElements[i].xmlElements[3].xmlElements[0].contents;
                      }
                      $.writeln('iteration: ' + $.hiresTimer + ' microsecs');
                      // This takes about 72 seconds for 2925 items.
                      
                      var theRule = new Array (new GetMessage);
                      __processRuleSet(theDocument.xmlElements[0], theRule);
                      $.writeln('XML Rule: ' + $.hiresTimer + ' microsecs');
                      // This takes about 1.7 seconds for 2925 items.
                      

                       

                       

                      I haven't benchmarked the cost of raw iteration, however. I guess likely the time is spent in the element lookup, and not in the mechanics of the for-loop. And yes, creating some sort of master array, or iterating the selection, may be the way to go. My question was an attempt to figure out alternatives, most of which don't seem that enticing. ;-)

                       

                      I was unaware of the E4X implementation. I'll try to track down info about this, but might be back if my search returns empty.

                       

                      Best wishes, Charles

                       

                      Oh, PS: I do believe that InDesign's XSLT is version 1.0, and slightly different from Filemaker's. :-( I've written a set of XSLT transformations to make my life easier in InDesign, but my friend who's doing the FMP work has yet to integrate my effort.

                      • 8. Re: XMLRules and "the selection"?
                        vze26m98 Level 1

                        OK, for the sake of completeness, I benchmarked the performance of E4X, and also a raw ExtendScript for-loop using this code, the companion to that posted above:

                         

                        var dummy;
                        var typicalContent = 'Some string that is about the length of a message or note.';
                        var fn, theData, theXML, theResult;
                        var items;
                        
                        dummy = $.hiresTimer;
                        for (var i = 0; i < 2925; i++) {
                          dummy = typicalContent;
                        }
                        $.writeln('Elapsed: ' + $.hiresTimer + ' microseconds\n');
                        
                        fn = File('/Users/cturner/Desktop/export-20110624.xml');
                        fn.open('r');
                        theData = fn.read();
                        fn.close();
                        theXML = XML(theData);
                        setDefaultXMLNamespace('http://www.filemaker.com/fmpxmlresult');
                        
                        dummy = $.hiresTimer;
                        items = theXML.RESULTSET.ROW.length();
                        for (i = 0; i < items; i++) {
                          dummy = theXML.RESULTSET.ROW[i].COL[4].DATA;
                        }
                        $.writeln('Elapsed: ' + $.hiresTimer + ' microseconds\n');
                        

                         

                        So for those keeping score, my results for a 2925 element iteration, I get:

                         

                        Raw loop:           624 microseconds
                        E4X:            1705949
                        XMLRules:       1729374
                        DOM iteration: 72216528
                        

                         

                        Hope this is of value to someone.

                         

                        Charles

                        • 9. Re: XMLRules and "the selection"?
                          John Hawkinson Level 5

                          Hi, Charles.

                           

                              I'm not 100% if there is still an open question, other than "Does anyone have any other ideas that haven't been suggested?"?

                          It definitely sounds like you have some workable answers. Anyhow, just some cleanup:

                           

                          I'm very grateful for the JSLint preamble

                          You're welcome. Note that JSLint is rather a moving target. I'm afraid code that satisfied it last month won't satisfy it now, and I'm having more and more trouble believing that satisfying it is worth the effort. Still on the fence, though. I'm kind of afraid to update my local copy...

                           

                          Here's a bit more context that I should have put in the previous post: What I'm trying to find are similar elements, and display that as context for the user. This procedure has the disadvantage of not being the goal of the task, but simply an aid. So taking a lot of time with it isn't very tolerable. Also, although a user might typically select only 10-20 TextFrames, there's nothing keeping them from selecting a whole lot more. The ~3000 example is used spanned only 10 InDesign pages, so there were about 300 TextFrames per page. Still pretty slow...

                          Note that it is very difficult [impossible?] to select items on multiple spreads. So that does tend to keep it down a bit...

                           

                          I don't know that you want to keep the model, but I would be curious what happens if you tried:

                          $.hiresTimer;
                          var itemArray =
                          theDocument.xmlElements[0].xmlElements[4].
                            xmlElements.everyItem().getElements();
                          for (var i = 0; i < itemArray.length; i++) {   dummy = itemArray[i].xmlElements[3].xmlElements[0].contents; } $.writeln('iter+getElements: ' + $.hiresTimer + ' microsecs');

                           

                          Especially because your tree is wide and not deep, this might actually be an easy answer.

                           

                          Oh, PS: I do believe that InDesign's XSLT is version 1.0, and slightly different from Filemaker's. :-( I've written a set of XSLT transformations to make my life easier in InDesign, but my friend who's doing the FMP work has yet to integrate my effort.

                           

                          Yes. When I said XSLT2 implementations didn't grow on trees, I should have been explicit: InDesign doesn't have one. Of course, you can use an external one, like saxon.

                           

                          Thanks for posting your benchmarks and code! More people should! :-)

                          • 10. Re: XMLRules and "the selection"?
                            Dirk Becker Level 4

                            Some more notes:

                             

                            while in general faster, we found ExtendScript E4X has a size limit where InDesign happily continues to work.

                             

                            The XPath as in one of the previous posts already appears fast, it is a frequent error to cause a depth-search via "//".

                             

                            XMLElement from CS4 on has an evaluateXPathExpression which may be slightly faster than the CS3 XMLRules, but in the long run it has caused memory problems with our InDesign Servers.

                             

                            E4X / XML also has an XPath() method worth trying.

                             

                            The same way that you can flatten the textFrames.everyItem().getElements() into an array, you can also work in XMLElement.

                            So you'd at least store the result intermediate result of

                             

                              myArray = theDocument.xmlElements[0].xmlElements[4].everyItem().getElements()
                            
                            ... and descend from there.
                            Just tried on a slightly different XML, the following code also works for me, it yields an array with strings.
                            app.activeDocument.xmlElements.item(0).xmlElements.item(0).xmlElements.everyItem().xmlElem ents.item(0).xmlElements.item(0).contents
                            Differing from John I prefer the item() functions to indicate plural objects rather than true instances of Array.
                            Dirk

                            • 11. Re: XMLRules and "the selection"?
                              vze26m98 Level 1

                              I think I've got enough to chew on courtesy this thread...

                               

                              Per John's request above:

                               

                               

                              var documentFilePath = app.activeDocument.filePath.fsName;
                              var theDocument = app.documents[app.activeDocument.index];
                              var items = theDocument.xmlElements[0].xmlElements[4].xmlElements.length;
                              var dummy;
                              
                              $.hiresTimer;
                              var itemArray = theDocument.xmlElements[0].xmlElements[4].xmlElements.everyItem().getElements();
                              for (var i = 0; i < itemArray.length; i++) {
                                dummy = itemArray[i].xmlElements[3].xmlElements[0].contents;
                              }
                              $.writeln('iter+getElements: ' + $.hiresTimer + ' microsecs');
                              

                               

                               

                              This took 13840131 microseconds on my 2925 element XML document, so it places on the speedy end, between the DOM and internal XML implementations.

                               

                              It might be the best solution in the short-to-medium term with a few hundred elements per page...

                               

                              Thanks, John!

                              • 12. Re: XMLRules and "the selection"?
                                vze26m98 Level 1

                                Hi Dirk-

                                 

                                Thanks for your notes. I tried the xpath() method in E4X while I was struggling with namespace issues. ;-) But the real gem in your post for me is the evaluateXPathExpression() method. If it proves fruitful, I'll post my results.

                                 

                                Many thanks! Charles

                                • 13. Re: XMLRules and "the selection"?
                                  Dirk Becker Level 4

                                  Don't miss the other single code line in the small print - I still wish the forum would revert to a usable plain text editor.

                                   

                                  Dirk

                                  • 14. Re: XMLRules and "the selection"?
                                    vze26m98 Level 1

                                    Hi Dirk-

                                     

                                    I hadn't overlooked that one, and yes indeed!! the Forum is hard and frustrating to work with. :-(

                                     

                                    I'll end with a statement of my solution. I think that creating an array of XML elements from the user selection, and then using E4X to extract my matching data is the way to go.

                                     

                                    I built a small benchmark which puts 420 of my XML elements (each with 12 children) on a page, and which then times how long it takes to build an array of associated XML elements from the selection.

                                     

                                    My timings show 72605 microseconds, which is pretty well below a half second, leaving me time to do my comparisons with E4X, which then get thrown into a user interface. I really doubt that a user will want to put more than 420 of my TextFrames onto a page, even if the paper is D format.

                                     

                                    Code is below. Thanks to John Dirk and Andreas for their help today!

                                     

                                    Best wishes, Charles

                                     

                                    var theDocument = app.documents[app.activeDocument.index];
                                    
                                    // Create a page with this:
                                    /*
                                    theDocument.viewPreferences.horizontalMeasurementUnits = MeasurementUnits.picas;
                                    theDocument.viewPreferences.verticalMeasurementUnits = MeasurementUnits.picas;
                                    var dims = [0, 0, 1, 7];
                                    for (var j = 0; j < 60; j++) {
                                      dims[0] = j;
                                      dims[2] = j + 1;
                                      for (var i = 0; i < 7; i++) {
                                        dims[1] = i * 7;
                                        dims[3] = (i * 7) + 7;
                                        // $.writeln((j*7)+i);
                                        theDocument.xmlElements[0].xmlElements[4].xmlElements[(j*7)+i].placeIntoFrame(theDocument.pages[0], dims);
                                      }
                                    }
                                    */
                                    
                                    // "Select all" on page before running this part:
                                    var theSelection = theDocument.selection;
                                    var items = theSelection.length;
                                    var dummy = new Array(items);
                                    $.hiresTimer;
                                    for (var i = 0; i < items; i++) {
                                      dummy[i] = theSelection[i].associatedXMLElement;
                                    }
                                    $.writeln('Elapsed: ' + $.hiresTimer + ' microseconds\n');