9 Replies Latest reply on Sep 2, 2011 1:38 PM by JoseAjáAjá

    [AS CS5.5] Bad Perfomance: sequentially using DOM to get references

    JoseAjáAjá Level 1

      Hi guys,

       

      The issue

      I'm having terrible performance problems when using a complex script I wrote.

       

      The script

      I need to make some special comparisons between found text and the texts in that page.

       

      My script is written in Flash Builder 4, using the CS_SDK (without Extension Builder)

       

      In short, my script has a class that performs the following tasks:

       

      1. Finds texts that meet some criteria (by using Document.findText() method) and stores the Array of texts result in an array.

      2. Traverses the Array of found texts and, for each one of them:

      a. Finds a reference to the page where that text is. Something like (within a for loop, with iFoundText as the indexing var):

       

       

      var currentParentPage     :Page     = 
           foundText[iFoundText].parentStory.textContainers[0].parentPage;
      

       

       

      b. Finds all the page items in that page. The code goes something like this:

       

       

      var myPageItems      :Object     = 
           currentParentPage.allPageItems;
                                                        
      

       

       

      c. If the PageItem is a TextFrame, then make the 'special comparison', word by word.

       

       

      if (currentPageItem is TextFrame)
           {
                ...
                var allTheWords     :Words =
                     currentPageItem.words
      
                for (var iWord:int = 0 ; iWord < allTheWords.length ; iWord++)
                {
                     //Code to perform comparison and related operations...
                }
           }
      

       

       

      It seems to work OK, when allowing to perform the script for he first three occurrences (although very slow, say some 12 seconds for this).

      So, what's the problem?

      What happens when I run it with more load:

      When I run the script for all the occurrences (I know that the number is around 100), the application halts, an stops responding. After 10 min it doesn't work anymore.

       

      What the documentation says:

      I found a document called "FEATURE DEVELOPMENT WITH SCRIPTING - Adobe CS5" (link), where it gives this recommendation:

       

       

      Performance techniques

      Minimize access to InDesign DOM

      Querying the InDesign DOM may be the main performance bottleneck for your script. A considerable amount of time typically is spent resolving object references, because InDesign does not hand out pointers to objects but rather uses references that need to be resolved every time they are used. Here are some techniques to alleviate this problem:

       

      • z Reduce the number of calls to the scripting DOM.
      • z Store and reuse resolved references in variables wherever possible.
      • z Use everyItem() to fetch and cache data of a collection object all at once, instead of querying the properties with separate calls.

       

       

       

       

      But I don't know how to not query InDesign's DOM every time...

       

      So. I understand that calling InDesign's DOM is bad for the script's performance, but I really don't know how could I avoid querying it for getting references to parent objects, or any other item.

       

      I have two questions

       

      1. If I have a variable stored in my AS script, for example:

       

      var anyTextItem:com.adobe.indesign.Text

       

      When I call a property or a method of that variable... Am I querying InDesign's DOM? (in other words, the variables' content is passed as a reference?)

       

      2. How do I query the DOM less times?

       

      For example:

       

      Problem: I constantly need to know what's the parent page of a found text, and then get the array of page items, and then, the array of words of every page item that is a textframe.

       

      How could I query the DOM once and then play with the stored variables instead of repetitively querying the DOM?

       

      Because, even if I have the variable stored, say "anyTextItem:Text", and I also have all the document's TextFrames stored in other variable (say "var allTextFrames:TextFrames"), I would still need to call "anyTextItem.parentTextFrame" to get a reference to the containing textFrame, and that would mean "querying InDesign's DOM".

       

      _____________________

       

      I would appreciate if anybody understands how to solve this.

      Thanks you, guys !

        • 1. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
          Andres_Mendoza

          Hello Jose,

           

          Question 1:

          On Actionscript, as a part of ECMAScript standard, as well as Javascript, primitive types are passed by value, objects are passed by refference (check out this link, that applies to Javascript, but it may work as well for ActionScript 3). Although, I believe there are ways to simulate passing primitive values by reference (link2).

           

          Question 2:

          I believe there should be a way to get the whole document into a single variable, but I don't think it would be easy to read it, i.e. to go through the whole tree of objects, finding out parents, children and assosiations. Have you looked for default parameters (memory heap size) on Flash configuration files? There is an object called System which holds this kind of info.

           

          I hope this helps!!!

          1 person found this helpful
          • 2. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
            Harbs. Level 6

            Any time you access any InDesign scripting object you are accessing the DOM. You want to deal with native scripting data as much as possible.

             

            In practice, the two most expensive objects to deal with are Collections and Text objects. (Actually Tables are very expensive as well, but let's ignore those for now.) The first is very expensive because every time you access a collection, it is reconstructed to ensure it's still valid. The workaround for this is to convert collections into arrays.

             

            Text objects are expensive to deal with because there's no direct mapping to the C++ level. Text objects must be constructed any time you use them.

             

            Now to your problem:

             

            a is fine.

             

            b is fine because it allPageItems returns an array. In case you don't know, allPageItems returns nested items (in groups or what-have-you). If that's what you want, fine. Otherwise you can use page.textFrames.everyItem().getElements() to get all text frames as an array.

             

            c. is very problematic on a number of fronts:

             

            1. for (var iWord:int = 0 ; iWord < allTheWords.length ; iWord++) Every time you call allTheWords.length (i.e. each iteration of your loop), the collection is rebuilt. You should create a reference to the length before you start your loop and use that.
            2. I don't know what you are doing inside the loop, but I'm sure there's ways to optimize what you are doing. If you only need to deal with contents, you can manipulate the strings directly (but you need to be careful not to mess up index references).
            3. If you need the Word objects, and you are dealing with various properties, you should use the properties object which returns the properties as a single native (i.e. ActionScript) object.

             

            There might be more optimizations you can do, but it's hard to say without seeing your code...

             

            Harbs

            1 person found this helpful
            • 3. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
              Harbs. Level 6

              Andres, I don't think Jose's problem has to do with pass by reference or pass by value. Those differences in performace are tiny compared to the performace problems caused by accessing DOM objects. Basically the issue is something like this:

               

              When you reference an InDesign DO object, the scripting layer calls the C++ layer which fetches the corresponding C++ object (or group of objects), and processes them to return the attributes in a scripting Object. These scripting objects are all smoke and mirrors and many of them don't even have "real" objects on the C++ side corresponding to them.

               

              This cross-level communication and constant rebulding of objects is where the real performance bottleneck come from in scripting.

               

              Also, trying to grab a whole document structure is pointless if at all possible (not to mention a huge waste of resources), because any DOM objects will need to be reconstructed any time you need to access them anyway.

               

              HTH,

              Harbs

              • 4. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
                JoseAjáAjá Level 1

                Andrés,

                 

                Thank you very much for your answer.

                 

                1

                It was a good idea to momentarily try to get the whole document's structure. That was my first though as well. Unfortunately, Harbs pointed out that even if we could do it, the DOM objects are reconstructed with each call and, consequently, there would be no preformance gains. Nonetheless, thanks for sharing those links. I now understand better when AS passes values and when references.

                 

                2

                Yes, I'm reading the ASDoc for flash.system.System. There's only a TotalMemory property that is [read-only] and the garbage collector method (that gets called automatically by AS). So, I'm still not sure if I can increase the memory heap size. That would be good, though, as a way to prevent the script from halting the application.

                 

                 

                Thanks for your possitive feedback.

                • 5. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
                  JoseAjáAjá Level 1

                  Harbs,

                  Thank you very much for your answer. You're very knowledgeable about InDesign DOM.

                   

                  Your message gives me a better understanding of how objects are managed between a scripting environment and the DOM. I still have some questions, though...

                   

                  b.

                  I will change the call to 'allPageItems' to 'page.textFrames.everyItem().getElements()'. It makes much more sense.

                   

                  Now, quick question:

                  What do you use 'getElements()' for?

                  What's the diference between calling 'page.textFrames.everyItem().getElements()' and 'page.textFrames.everyItem()'?

                   

                  c.

                  OK. I kinda got it. But, I have questions here too.

                   

                  For me, it's been quite tricky to understand indexes and references for InDesign DOM objects.

                   

                  Let's say that I have stored a Story's words object in my script. Something like:

                   

                   

                  var currentWords     :Words = currentStory.words;
                  

                   

                   

                  And then, I go through a TextFrame that contains part of that Story, and find one word that meets my criteria. So I get:

                   

                  var foundWord        :Word = (... Somehow I got a reference to this ...);
                  

                   

                  Is it there a way to easily find that word (by index) in the 'currentWords' object? (maybe using index or id...) The problem is that I've found that sometimes the indexes don't match (i. e. if I call "foundWord.index", its index can be way larger than the parentStory.words.length, for example, I got foundWord.index=942, whereas foundWord.parentStory.length=742).

                   

                  And, as a consequence, I'm using long workarounds to find the word within that collection (and I guess you'd scold me for this one; now I think that might be a serious performance killer, because I'm reconstructing the 'words' collection over and over again :S)

                   

                   

                  Thanks again to both of you (@Harbs and @Andrés), you've been very kind and helpful.

                  • 6. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
                    Andres_Mendoza Level 1

                    Thanks Herbs for pointing out this important details, e.gr. every time an object (reference) is called or used, it is reconstructed agian, and there specific objects that don't have it's corresponding "native" C++.

                    (I was about to delete the idea of getting a complete application object, as it is unusefull... pass by reference )

                     

                    Right now I am about to create a script that needs to go through ever character on the document, and find if a certain property is true... for what Jose has said, I may be comming to this post in a couple of hours.

                     

                    Cheers,

                    Andres Mendoza

                    • 7. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
                      Harbs. Level 6

                      JoseAjáAjá wrote:

                      Now, quick question:

                      What do you use 'getElements()' for?

                      What's the diference between calling 'page.textFrames.everyItem().getElements()' and 'page.textFrames.everyItem()'?


                      page.textFrames.everyItem() returns a very odd object. It's a TextFrame object, but instead of each of its properties having a single value, the properties are arrays of values. I'm not sure how well that works in strong-typed Actionscript. Even in Extendscript, it's hard to work with those objects unless you really know what you are doing. getElements() returns an array of proper TextFrame objects.

                       

                      c.

                      OK. I kinda got it. But, I have questions here too.

                       

                      Text indexes are different than indexes of other collections. Normal collections have an index property which is the index within the collection. For text objects, the index is the index of the first Character (InsertionPoint?) of the text object within the whole text contents. On the C++ level, none of the text objects exist at all. That's probably why things are a bit weird. Also, you should know, that the index WILL change if you change any of the contents earlier in the story.

                      • 8. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
                        JoseAjáAjá Level 1

                        Thanks again, Harbs.

                         

                        Actually, working in strong-typed ActionScript makes things even weirder. Specially, when the return type on the ID-DOM side is :Object... It messes everything up on the AS side, and I really try to avoid getting those objects. Don't know if you have experienced that.

                         

                        __

                         

                        I will try to play a little bit with the concepts you taught me, to see what works best for me. I'll try to grasp a better understanding on the Text's index property.

                         

                        As soon as I get news on this, I'll come and close the thread, or maybe ask you some other things (may I.. )


                        Many thanks!

                        • 9. Re: [AS CS5.5] Bad Perfomance: sequentially using DOM to get references
                          JoseAjáAjá Level 1

                          UPDATE:

                           

                          Harbs pointed me in the right direction. I found two great blog posts written by a guy named Marc Autret. He explains in detail the 'object specifiers' concept behind how the 'everyItem()' method, as well as the 'getElements()' method work in 'collections' (i.e. words, Pages, etc...).

                           

                          For anyone intested in this subject, the blog posts are:

                           

                          http://indiscripts.com/post/2010/06/on-everyitem-part-1

                          http://indiscripts.com/post/2010/07/on-everyitem-part-2

                           

                          Hope someone finds them useful.