4 Replies Latest reply on Oct 17, 2009 6:07 AM by cwint

    javascript to build index, accumulating character offset

    cwint

      HI script & index wonder-workers

      I have a script that I run on single document of 160 or so pages from a list of terms to index.

      It does the job but tends to misplace index markers progressively further offset from the real location of the real target word.

      The direction of offset changes depending on if I increment up or down my list of txets returned from the story.search.

      Using CS2, there are footnotes.

      The offset is not exactly=number of footnotes encountered so far, but does broadly scale with about 60% of it (I've not checked this in detail) with 1000 marks in file some later pages have about a 50char offset from the word s in my list.

      Very few footnotes havewords in my index, although 1 or 2 do I'm OK to just catch them.

       

      I have tried reinvoking story.search to create a new list of texts after each (or 1 in 5) pageReferences.add

      in case the insert of the index marker creates the offset - but this just makes it run slower and not fix the problem.

      Most index refs come up correct but there will inevitably be some that drop from page x to page x-1

       

      Any ideas where I should look?

      Is the var myStory getting a copy or pointer to the changing story??

       

      for (var s = 0 ; s < doc.stories.length ; s++) {
          var myStory = doc.stories.item(s);
         
          if (myStory.length > 800) { // avoid small story-objects (notes, headers etc)
              toindex = myStory.search(myUpper); //Index entry with initial CAP in this story text
              if (toindex.length > 0) // found the text at least once...
              {
                  newtopic = doc.indexes[0].topics.add(myUpper);
                   r=0;
                  for( i = toindex.length-1; i > -1; i-- ) // this way marks DRIFT after word
      //            for( i = 0; i < toindex.length; i++ ) //marks DRIFT ahead of word
                     {
                      r++;
                      try
                        {
                         newtopic.pageReferences.add( toindex[i], PageReferenceType.currentPage );
                            if (r==5) {
                                toindex = doc.stories.item(s).search(myUpper); //costs time, helps ? reset DRIFT every 5th cycle
                                r=0;
                                }
                        }
                      catch(_)
                        {
                        // many of these will be footnote locations...
                            alert ("Index problem >" +myUpper+"<item "+i+ " may be in a footnote");
                        }

        • 1. Re: javascript to build index, accumulating character offset
          Peter Kahrel Adobe Community Professional & MVP

          The logic of the code you posted is ok: after some small adjustment to make it work in CS4 (the search bit), and leaving out your drift adjuster, the script worked ok. How the drift can occur when you process the found array from end to front I don't understand. Did you try other documents?

           

          Maybe create a small test document say, 200 words  (no tables, footnotes, inlines -- just some text), with 20 occurrences of one term to be indexed, and see what happens. If that works, then it's something in your document.

           

          Peter

          1 person found this helpful
          • 2. Re: javascript to build index, accumulating character offset
            cwint Level 1

            Some structured test are the way to go, well pointed out.

            It seems weird at first!

            40 or so pages of text (12000 words), 7 or 8 words to index = FINE

            40 or so pages of text (12000 words), 7 or 8 words to index with a table halfway = DEPENDS...

            +++ with table as 4 rows, 4 cols, 1 hdr row (all cells empty) ALL OK

            +++ with table as 5 rows, 4 cols, 1 hdr row (all cells empty) THEN words that are searched

                with app.findPreferences.wholeWord = true;   >>  the marks are offset from words by 5 chars (I use this for short words) after the table

                with app.findPreferences.wholeWord = false;  >>  the marks are correctly placed (for long words to allow simple stemming)

             

            AH, next tests shows probable pattern:

            +++ table as 7 rows, 4 cols, 1 hdr then words of over 7 chars seem OK but those shorter (findPreferences either way) show the index mark offset beyond the end of the word.

            +++ table as 9 rows, 4 cols, 1 hdr then all show the index mark offset beyond the end of the word (my longet index word being 9 characters).

             

            My guess is each row (more or less) offsets 1 char but small offsets do not upset the mark position as it must word object lock in some way.  A larger offset (from more table rows) then starts to show on short words but not on long ones (like character) which still lock the marker to front of the word.

             

            SO a workaround might be to split texts between tables (& other types? I think images are ok) and possibly adjust the list of texts returned by .search (by the number of rows in the preceeding tables) if that proves to be required. I haven't a clue how to code that at present!

            So IF there are a few tables I may just PB them, index and then replace.

             

            But I'd welcome any code suggestions/solutions to fix it up...

            • 3. Re: javascript to build index, accumulating character offset
              Peter Kahrel Adobe Community Professional & MVP

              Good tests! I wasn't sure which Indesign version had the terrible table problem, CS1 or CS2, apparently it was CS2. The problem with tables was that they slowed down scripts to a pathetic crawl. It seems now that they affect scripted indexes as well.

               

              By placing all tables in their own frames I got around the general speed problem.
              Maybe that works for the index drift as well. Worth a try.

               

              Peter

              • 4. Re: javascript to build index, accumulating character offset
                cwint Level 1

                The embed in a text frame certainly ends my problems here in CS2 - everything is fine if  I do that, no offset even after 6 large tables.

                 

                So BUG in CS2 not dealt with in code but an acceptable solution, except for long tables where I would want to use repeating headers on each page - they will be hard work to reproduce and deal with reflows of text I suspect if I stay embeded.

                I think I'll keep tables in an inline text frame until indexes are done and then break any out into tables with rpt headers where that is what I need.

                 

                Many thanks for the good steers, Peter