5 Replies Latest reply on Feb 26, 2016 6:43 AM by GeraldHlasgow

    Paragraph Index property, performance issues

    GeraldHlasgow

      I'm writing a script which loops through a document index's Topic objects, and for each Topic goes through the PageReference objects and does various bits of processing which I won't bore you with. All was going fine.

       

      For this particular part of the script, I already had code which processed a PageReference which fell on a later page than the previous PageReference, but I wanted to handle things differently if the new PageReference, while on a later page, was actually still in the same Paragraph as the previous one..

       

      Finding the page number is easy enough, finding the paragraph is easy enough, but I assumed that it would be much quicker to compare a property which would be unique to each Paragraph rather than compare the Paragraph objects themselves, hence I picked on the Index property. And I ended up with something like this:

       

      var oInsertionPoint = oPageReference.sourceText;
      var nReferencedPage = Number(oInsertionPoint.parentTextFrames[0].parentPage.name);
      
      var nThisParaIndex = oInsertionPoint.paragraphs[0].index;
      
      if (nReferencedPage > nLastReferencedPage)
      {
          if (nThisParaIndex != nLastParaIndex)
              // processing here
          else
              // alternative processing here
             
          nLastReferencedPage = nReferencedPage;
      }
      
      nLastParaIndex = nThisParaIndex;
      
      

       

      And suddenly the script, which had previously been fine, ran like a three-legged dog through treacle. Switching on the Profiling options within the ExtendScript Toolkit tells me that more than 50% of script's processing time is taken on this one statement "var nThisParaIndex = oInsertionPoint.paragraphs[0].index;" - a not unreasonable calculation given that its insertion doubled the time the script takes to run. I tried splitting it into two, first grabbing the paragraphs[0] object and then saving its Index value, but this didn't speed things up, it just confirmed that it was the reference to the Index property, not the reference to the Paragraph object itself which was taking all the time.

       

      I tried switching the references and tests from the Index values to the Paragraph objects themselves, and changing the problematic line to "var oThisPara = oInsertionPoint.paragraphs[0];" was certainly quicker, but then the comparison between the "this" and "last" Paragraph objects was much slower than comparing the two numeric values, so there really wasn't an improvement.

       

      Can anyone offer a solution to this problem? Either by resolving what's taking so long to access the Index property and finding a way to cure that issue? Or by offering an alternative and less slow Paragraph property that I could check? Or even a completely different approach which would allow me to check the same thing?

       

      Edit: I should add that this is for CS5.5.

        • 1. Re: Paragraph Index property, performance issues
          Peter Kahrel Adobe Community Professional & MVP

          Gerald,

           

          The trouble is probably that you go through InDesign's model all the time. For example, every time you invoke that line

           

          var nThisParaIndex = oInsertionPoint.paragraphs[0].index;


          the script resolves oInsertionPoint, which it does by invoking the first line of the script,


          var oInsertionPoint = oPageReference.sourceText.


          The way out is to call upon InDesign's model as little as possible. But in your script it's not so easy to see how all those call can be avoided. You could think of other methods, but whether that's feasible depends on whether your processing and alternative processing involve changes to the text. It would also be useful to know whether all page references are in the same story.


          Peter

          • 2. Re: Paragraph Index property, performance issues
            GeraldHlasgow Level 1

            Yes, Peter, all the page references are in the same story.

             

            I think that if the whole thing had been slow I'd have just accepted that "that's how it is", but it just seemed so odd that the one line of code was taking so much more time than anything else. As you can see from my code the lines which set the three variables oInsertionPoint, nReferencedPage and nThisParaIndex will each be executed the same number of times in any given run.

             

            I'm not sure what units the ExtendScript Profiler timing data is in, but the three values for those lines are (I've rounded them, for simplicity) 550,000 (oInsertionPoint), 800,000 (nReferencedPage) and 15,000,000 (nParaIndex). So as you can see, that third line takes roughly 11 times longer to execute than the first two lines put together.

            Obviously these values are the sum for the whole duration of the script.

             

            Do you think there may be a benefit to splitting the document into multiple stories - one per chapter, for instance?

            • 3. Re: Paragraph Index property, performance issues
              GeraldHlasgow Level 1

              Actually, I can answer that final question myself. Yes, there is a benefit, a very tangible one.

               

              Splitting the main body of the book into multiple stories, one per chapter, it more than halves the time the script takes to run. Setting the value of nThisParaIndex still takes longer than setting oInsertionPoint and nReferencedPage combined, but it's now only about 1.5 times as long rather than 11 times as long.

               

              I still don't really understand why what appears to be a direct link between the PageReference's SourceText (aka InsertionPoint) to Paragraph should speed up so much - it's not as though I was directly using the Story object to access it, and of course the number of Paragraph objects hasn't changed, the document still has just as many paragraphs as it had before.

               

              Hmmm. One to ponder.

              • 4. Re: Paragraph Index property, performance issues
                Peter Kahrel Adobe Community Professional & MVP

                Maybe it's the case that up to a certain number of calls/processes, things go relatively smoothly, and that when some critical point is reached, things start to fall apart. That would explain why dividing the story into separate stories gave you that speed boost: the critical point is never reached.

                 

                It could also explain why adding that statement "var nThisParaIndex = oInsertionPoint.paragraphs[0].index;" to your well-working script reduced it to a crawl: before adding that statement, the critical point wasn't reached; adding the statement added sufficient calls to tip the script over the critical point. I don't know if all this is true, but it seems consistent with your observations.

                 

                For interest's sake you could try the following approach on your one-story publication. The idea is this: you create an array of paragraph indexes, which would look like e.g. [0, 100, 150, 500, . . .]. This captures that the index of the story's first paragraph is 0, the second paragraph's index is 100, the third, 150, etc. This could be a big array.

                 

                To determine whether two cross-references are in the same paragraph, you look up the rank of the sourceText.index of both paragraphs. This script illustrates:

                 

                // Create an array of indexes of the paragraphs in myStory
                // You would do this just once.
                
                var paragraphIndexes = myStory.paragraphs.everyItem().index;
                var last = paragraphIndexes.length-1;
                
                // Given two pageReference objects xRef1 and xRef2, determine the rank of each.
                // If the ranks are the same, the pageReference objects are in the same paragraph.
                
                if (getRank (xRef1.sourceText.index) === getRank (xRef2.sourceText.index)) {
                  // xRef1 and xRef2 are in the same paragraph
                }
                
                function getRank (index) {
                  var n = last;
                  while (n >= 0 && index < paragraphIndexes[n]) {
                    n--;
                  }
                  return n;
                }
                
                

                 

                In this approach, you access InDesign's object model fewer times than in your script. You would access it every time the script determines a pageReference's sourceText index, but it does that twice for every comparison.

                 

                Peter

                • 5. Re: Paragraph Index property, performance issues
                  GeraldHlasgow Level 1

                  Thanks, Peter, that certainly improves things a lot though I don't entirely understand why it should.

                   

                  If the problem had been accessing the Paragraph object itself, I would have completely understood how a single sequential trawl through all the Paragraphs objects in sequence would be much quicker than randomly accessing each one individually (and in many cases more than once) - and it almost certainly does speed that aspect up to some extent in any case..

                   

                  But what it removes entirely is the incredibly slow access of the Index property.

                   

                  It doesn't speed things up quite as much as splitting the original long story in chapters did, but it comes close, and I'd much rather keep the single story.

                   

                  Thanks again.