8 Replies Latest reply on Nov 16, 2009 5:07 PM by Mac_06

    extracting page numbers for XML export

    ThisGuy-500 Level 1

      Hi All,

       

      I'm exporting XML for textbook layout InDesign CS4, and I need to have page numbers associated with particular tags as an attribute.

      So whenever the style "heading1" appears in my layout, it will run a script to determine the current page that the heading is on, and create an attribute.

      (e.g. <heading1 pagenum=32>This is a Heading</heading1>)

       

      I think this might be part of the code that would grab the current page,

      http://forums.adobe.com/thread/451546

       

      ...and the "MakeXMLAttribute" in the Adobe JS Guide might be the foundation for creating the attributes for each heading.

       

      Any heavyweights that can help with this?

       

      If you have any ideas, could you send them along?  Much appreciated.

        • 1. Re: extracting page numbers for XML export
          [Jongware] Most Valuable Participant

          Here's a rough go at it.

           

          app.findTextPreferences = null;
          app.findTextPreferences.appliedParagraphStyle = "heading1";
          found = app.findText();
          // alert ("found "+found.length);
           for (a=0; a<found.length; a++)
          {
            el = found[a].characters[1].associatedXMLElements;
            // alert (el[0].markupTag.name);
            if (el[0].markupTag.name == "heading1")
            {
              try 
              {
                pagename = found[a].parentTextFrames[0].parent.name;
                // alert (pagename);
                el[0].xmlAttributes.add ("pagenum", pagename);
              } catch(_) { alert (_); }
            }
          }
          

           

          I use the 2nd character from the found paragraph text because I have a habit of not including the final hard return in XML (.. why do I do that? Can't remember...). That means the 'found' item -- the entire paragraph -- doesn't span the entire XML element. No prob, I just used character #1. Hey, it doesn't always work ..? Well, then take #2 (and that works fine, right up & until the heading is shorter than two characters).

           

          Note that there is a huge restriction on just grabbing the page number like this: the found items all must be in a main text frame, and not nested into groups in other frames inside tables. If you suspect you have constructions like these, check the attributes list -- those without a page number had an error somewhere. If you search this forum, you might find a better ''get-this-page-number".

          • 2. Re: extracting page numbers for XML export
            [Jongware] Most Valuable Participant

            I should probably add that the script searches for the style "heading1" and adds the attribute to the XML element "heading1" that's used to tag the heading with.

             

            It should also be possible to do it the other way around: looping over your XML tree, and on each tag "heading1", examine straightaway on which page its associated text resides -- that would simply be "xmlElement[x].characters[0].parentTextFrames[0].parent.name".

             

            (I might be a bit careless using the word "simply" here.)

             


             

            [Edit] .. or was I? This is massively faster:

             

            markHeaders (app.activeDocument.xmlElements[0]);
            function markHeaders (rootElem)
            {
             if (rootElem.markupTag.name == "heading1")
             {
              try {
               rootElem.xmlAttributes.add ("pagenum", rootElem.characters[0].parentTextFrame.parent.name);
              } catch (_) { }
             }
             for (a=0; a<rootElem.xmlElements.length; a++)
              markHeaders (rootElem.xmlElements[a]);
            }
            
            • 3. Re: extracting page numbers for XML export
              ThisGuy-500 Level 1

              First off, thank you for such a quick response!  Much appreciated.

               

              1.  So, I changed "heading1" to "H1" in the script to match my XML tags, and ran it from the script panel--which produced the error attached (image file).

               

              2.  Next, I modified the script per your second response:
              rootElem.xmlAttributes.add ("pagenum", rootElem.characters[0].parentTextFrame.parent.name);

               

              ...and it has not stopped churning after 15 mins on a 8 page layout.

               

              I should ask, I assume I should run this script with my cursor in the story, with the structure pane open to the side.  Am I forgetting something perhaps?

              • 4. Re: extracting page numbers for XML export
                [Jongware] Most Valuable Participant

                Strike two, eh? Hm...

                 

                I must admit both scripts were written for my ancient CS -- but I really thought I knew the tiny differences by now.

                For the first script, try

                 

                app.findTextPreferences.appliedParagraphStyle = app.activeDocument.paragraphStyles.item("H1");

                 

                -- it oughta do the same (more or less). Does that style "H1" really exist? (Perhaps it's inside a style group?)

                 

                2.  Next, I modified the script per your second response ..

                 

                It's not so much a modification, rather an entirely new script. On my very, very, very old computer (hold on -- have to shovel some more coal in it...) the first script pauses, then suddenly fills in the attributes. With the second script, they appear instantly.

                 

                Both scripts ignore the position of the text cursor. The first one does a global 'search', all over your document (hence the warning about nested frames). The second one doesn't even do that, it merely checks the XML tree. It doesn't matter if the structure pane is open or not.

                 

                If your computer is still happily purring to itself, try whacking the escape key a few times. It oughta stop the script running, and you can check if it changed anything at all in the XML structure.

                 

                If none of the above does anything for you, I'll have to take a break and try it on a modern system (that would be tomorrow, and I'm on GMT+1).

                • 5. Re: extracting page numbers for XML export
                  Mac_06 Level 2

                  Hey Great jongware,

                   

                  You gave me a way to do that but still both script couldn't reply with required result.

                   

                  1. First script still showing the same error as eralier defined by Thisguy_500

                   

                  2. Second script also churning more then 15 minuets and also it's create a image folder at export xml path, could we avoid that folder.

                   

                  I am looking forward with your response.

                   

                  Many thanks in advance

                  • 6. Re: extracting page numbers for XML export
                    [Jongware] Most Valuable Participant

                    I am so sorry, guys. Not one, but two major errors! (And for some reason both 'activate' in CS4 but not in CS.)

                    Here is a better version:

                     

                    markHeaders (app.activeDocument.xmlElements[0]);
                    function markHeaders (rootElem)
                    {
                      var a;
                     if (rootElem.markupTag.name == "heading1")
                     {
                     // if (confirm (rootElem.markupTag.name+"\nContinue?")==0) exit(0);
                     // if (confirm (rootElem.markupTag.name+"\n"+rootElem.lines[0].contents+"\nContinue?")==0) exit(0);
                      try {
                       rootElem.xmlAttributes.add ("pagenum", rootElem.characters[0].parentTextFrames[0].parent.name);
                      } catch (_) { alert("wot? "+rootElem.lines[0].contents+"\n"+rootElem.characters[0].parentTextFrames[0].parent.name); exit(0);}
                     }
                     for (a=0; a<rootElem.xmlElements.length; a++)
                      markHeaders (rootElem.xmlElements[a]);
                    }
                    

                     

                    First thing I did was disabling the try..catch that handles errors, so I could see what went wrong. That was a bit of a "D'oh" -- I used parentTextFrame, rather than parentTextFrames[0] (versioning error, and boy, I should have watched out for these!).

                     

                    The second one was more subtle: the script stopped after processing 2 headers, complaining "Atrribute already exists". That took a fair bit of debugging step-by-step, until I noticed I used the variable 'a' to loop ... inside a function that calls itself ... so the variable gets reinitialized, and starts over ... at 0 ... every ... time ... it ... calls ... itself. Functions calling themselves is called "recursion", and I now know where it got that name (I cursed and re-cursed until I finally got it to work).

                     

                    Good news is, this script is still blistering fast -- now that it finally works.

                    • 7. Re: extracting page numbers for XML export
                      ThisGuy-500 Level 1

                      Jongware,

                       

                      It works like a charm!  Thank you so much. 

                      • 8. Re: extracting page numbers for XML export
                        Mac_06 Level 2

                        You are amazing Jongware it's working very nicely.

                         

                         

                        Thankyou very much