14 Replies Latest reply on Mar 3, 2011 1:42 AM by Loic.Aigon

    ESTK/INDESIGN XML Parsing limit ?

    Loic.Aigon Adobe Community Professional

      Hi Guys,

       

      My title looks more confused than my problem is:

      I am trying to create a XML object from a XML file with XML(xmlFile.read());

      For reasonable sized xml files, it's ok but if the XML is huge, it's like the scripting engine fails to create the XML object. Is it a known issue or me doing something wrong ?

       

      Sorry if it's a tranversal question but as I am trying to achieve via the ESTK and later with Indesign, I think it's however making sense to ask here.

       

      TIA Loic

        • 1. Re: ESTK/INDESIGN XML Parsing limit ?
          John Hawkinson Level 5

          How big is big?

          Does it scale with bytes or nodes or depth?

          If you use an intermediate string variable does it change anything?

          Are you targetting the ESTK or InDesign? Does behavior change?

          CS5?

           

          Theoretically you could build the XML object incrementally...

          • 2. Re: ESTK/INDESIGN XML Parsing limit ?
            Loic.Aigon Adobe Community Professional

            Hi John,

             

            The file isn't that heavy as it's only 95ko but has 2200 lines.

             

            That's an Excel sheet exported to XML.

             

            I tried to execute within ESTK pointing to ESTK  or INDESIGN and none of these could help.

            It's a xml exported from a Excel Document (Excel 2003 XML).

             

            I tried the temporary string without success. I could of course manipulate that string to reduce the content of the XML object but it's not an ideal solution and not generic neither :S

             

            Hope there is a way.

             

            Loic

            • 3. Re: ESTK/INDESIGN XML Parsing limit ?
              John Hawkinson Level 5

              I think you should be more specific about what kind of XML you are trying to parse.

              I ran some casual benchmarks and the time looks linear with the number of elements (column 3). Note that the time it takes to generate an array and convert it to a string (columns 4 and 5) are much worse than the time to instantiate the E4X XML object. This is targetting InDesign CS5 Mac:

               

                  0          1          0          0          0
                  1          3          0          0          0
                  2          7          1          0          0
                  3         15          0          0          0
                  4         31          1          0          0
                  5         63          1          0          0
                  6        127          2          1          0
                  7        255          2          2          0
                  8        511          4          4          0
                  9       1023          8          6          2
                 10       2047         17         46          5
                 11       4095         35         53         14
                 12       8191         68        196         55
                 13      16383        136        431        195
                 14      32767        278       1082        726
                 15      65535        583       5806      20924
                 16     131071       1157      37870      90983
              

               

              If instead I target the ESTK its quite similar:

                  0          1          0          0          0
                  0          1          0          8          0
                  1          3          0          0          0
                  2          7          0          1          0
                  3         15          0          0          0
                  4         31          0          0          0
                  5         63          1          0          0
                  6        127          1          1          0
                  7        255          2          2          0
                  8        511          4          2          1
                  9       1023          9          6          1
                 10       2047         16         14          5
                 11       4095         35         81         16
                 12       8191         72        124         61
                 13      16383        143        363        220
                 14      32767        290       1167        806
                 15      65535        575       7911      24248
                 16     131071       1197      51845     109387
              

               

              though after that things get fairly slow, I assume (naively!) because of the array and string creation.

               

               

              And the code:

               

              (function() {
                  var i, j, n,
                      xmlarr =[], count=0,
                      xmlstring, xml, times = [];
                      
                  function pad(s, n) {
                      var m,
                          str=s.toString();
                      
                      m = n-str.length;
                      if (m>0) {
                          return (new Array(m+1).join(' '))+str;
                      } else {
                          return str;
                      }
                  }
                      
                      
                
                  i =0;
                  for (n=1; n<1e6; n=2*n, i++) {
                      //xmlarr = [];
                      times = [];
                      times.push(new Date().valueOf());
                      for (j=n; j>0; j--) {
                          xmlarr.push('<boringtag attr="'+j+'"/>');
                      }
                      count+=n;
                      times.push(new Date().valueOf());
              
                      xmlstring = "<Root>"+xmlarr.join('')+"</Root>";
                      times.push(new Date().valueOf());
              
              //~         xmlstring = "<Root>"+
              //~             (new Array(n).join('<boringtag attr="'+n+'"/>'))+
              //~             "</Root>";
              
                      xml=new XML(xmlstring);
                      times.push(new Date().valueOf());
                      $.writeln(pad(i,5)+" "+pad(count,10)+" "+
                        pad(times[3]-times[2],10)+" "+
                        pad(times[1]-times[0],10)+" "+
                        pad(times[2]-times[1],10));
                  }
              
              }());
              
              
              • 4. Re: ESTK/INDESIGN XML Parsing limit ?
                Loic.Aigon Adobe Community Professional

                Thanks for your interest and help,

                 

                I am trying to parse a XML file generated from Acrobat which was a excel document at the origin. I do this as Acrobat export tables infos much more cleaner than Excel could do natively.

                 

                My point is then to recreate several Indesign Tables. I required authorization for sending the XML file as I am not the author. As soon as I get the ok, I will link it here.


                Anyway, here is the code I wrote :

                 

                table2XML();
                
                function table2XML(){
                    
                        var myXML = File.openDialog();//File(Folder.desktop+"/aTable.xml"), //.openDialog(),
                        tables,
                        outXML,
                        declaration,
                        xmlBody, 
                        doc;
                        
                        if(!myXML || myXML.name.match(/\.xml$/i)==null){
                            return false;
                        }
                    
                        var xmlParser=(function(/*File*/xmlFile)
                        {
                               return {
                                       init: function()
                                               {
                                               xmlFile.open('r');
                                               var myContent = 
                                               return XML( xmlFile.read() );
                                               },
                                        getMaxChild: function(node)
                                            {
                                                var nodeMax=node.length(),
                                                     maxChild=0,
                                                     i;
                                                for(i=0; i<nodeMax; i+=1){
                                                 nodeChildLength=node[i].children().length();
                                                 if(nodeChildLength>maxChild){ maxChild = nodeChildLength }
                                                }
                                                return maxChild;
                                            },
                                        createXMLTables:function(xml)
                                        {
                                            var tables=xml.Table, 
                                                    tMax=tables.length(),
                                                    rowCount, 
                                                    columnCount,
                                                    tablesArr=[];
                                            for(var i=0; i<tMax; i+=1){
                                               var tbl =  tables[i],
                                               rows=tbl.TR,
                                               rowsMax=rows.length(),
                                               rowCount=tbl.TR.length(),
                                               columnCount=xmlParser.getMaxChild(tbl.TR),
                                               tblArr=["<Article><Table xmlns:aid=\"http://ns.adobe.com/AdobeInDesign/4.0/\" aid:table=\"table\" aid:trows=\""+rowCount+"\" aid:tcols=\""+columnCount+"\">"];            //"];
                                               for(var j=0; j<rowsMax; j+=1){
                                                   var childs=rows[j].children(), ck;
                                                   for(var k=0; k<columnCount; k+=1){
                                                       ck=childs[k];
                                                       if(ck===undefined) ck="";
                                                        tblArr.push("<Cellule aid:table=\"cell\" aid:crows=\"1\" aid:ccols=\"1\" aid:ccolwidth=\"161.5564304461942\">"+ck+"</Cellule>");
                                                   }
                                               }
                                                tblArr.push("</Table></Article>");
                                                tablesArr.push(tblArr.join(""));
                                             } 
                                             return tablesArr;
                                        }
                               };
                        })(myXML);
                
                        myXML = xmlParser.init();
                        alert(myXML);
                        /*
                            //Assembling the ouput XML File
                        declaration = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>";
                        xmlBody = xmlParser.createXMLTables(myXML)//.toString()//.replace(/(<Cellule )xmlns:aid="http:\/\/ns.adobe.com\/AdobeInDesign\/4.0\/\"/g, "$1");
                
                        outXML = new File(Folder.desktop+"/testXML.xml");
                        outXML.open('w');
                        outXML.write(declaration+"\r<Root>"+xmlBody.join("\r")+"\r</Root>");
                        outXML.close();
                        
                        if(app.documents.length==0){ app.documents.add(); }
                        
                        doc=app.activeDocument;
                        doc.importXML(outXML);
                        
                        */
                }
                

                 

                As I said, with small xml files, it works like a charm:

                ok.png

                But on this 2200+ lines XML, it seems like the XML object isn't created and so is the alert :

                ko.png

                And I can't access any child of teh XML element so I guess, it's not only a matter of bug display :S

                 

                I hope I can post the source XML quick.

                 

                Loic

                • 5. Re: ESTK/INDESIGN XML Parsing limit ?
                  John Hawkinson Level 5

                  Oh my goodness. That's a bit confusing to follow.

                   

                  At the start of table2XML(), you initialize myXML to be a File object.

                  Then you call declare xmlParser as an Object with methods that are bound to myXML (as xmlFile).

                  Then you assign the result of xmlParser.init() to myXML.

                   

                  So myXML starts out as a file object, and then changes into a XML object? Having the types of variables change can be very confusing to read!


                  Is the problem, then, that xmlParser.init() fails to return?

                   

                  (It also seems like alert(myXML) is a very bad idea if is very large. But I'm not quite sure how the XML.toString() works, I think I've sometimes seen some very long things and sometimes seen nothing.)

                   

                  Anyhow, I wasn't expecting you to post your actual confidential data. But I assumed you could easily build a test case that was similar enough to it to cause the problem. I must admit, my comments on the benchmark may have been misleading. I expected line #17 to have printed a few minutes after line #16, and here we are three hours later [Edit: 1.5 hours later] and ID is still taking 100% of one core on this 2.8 GHz quad core Xeon.  But it still could be the string and array operations...

                   

                  Anyhow, it does not look good for this approach...

                  • 6. Re: ESTK/INDESIGN XML Parsing limit ?
                    Loic.Aigon Adobe Community Professional

                    Hi John,

                     

                    I am switching myXML from File object to XML object in order to manipulate more easily the content than with a string.

                     

                    I use xmlParser to create a toolbox for my XML Object.

                     

                    Alert is only for debugging to see if something happens.

                     

                    Unfortunately, I wasn't allowed to expose the xml but was given another one so i am trying to see if it's causing the same issue and come back here later.

                     

                    Maybe I use strings but i find it's  pity to loose every E4X advantages :\

                     

                    Thanks for all,

                     

                    Loic

                    • 7. Re: ESTK/INDESIGN XML Parsing limit ?
                      John Hawkinson Level 5

                      The code would be a lot clearer if you kept the File object in one variable and the XML in another.

                       

                      E4X is no good if it doesn't work. You could always try the InDesign DOM's XML support. Though I imagine that is worse.

                      1 person found this helpful
                      • 8. Re: ESTK/INDESIGN XML Parsing limit ?
                        Loic.Aigon Adobe Community Professional

                        You are right. Being stubborn is helpless :\

                        Thanks anyway for all your support

                         

                        loic

                        • 9. Re: ESTK/INDESIGN XML Parsing limit ?
                          John Hawkinson Level 5

                          Well, somewhere after n=16, my InDesign process went south and so did my ESTK. Coming back several hours ago, I have an newly-restarted ESTK that reports in the console (targetting InDesign):

                             19    1048575       9518    8341821    7448177
                          

                           

                          So, it took 9 seconds to create the XML object with 1.1 million nodes. Sadly, it also took 4.3 hours to generate the string to feed to the XML object. Which is a good reminder that neither arrays nor strings are terribly efficient in JavaScript, and this kind of bnechmark in the future should generate the strings some other way, like a perl or python script.

                           

                          Anyhow, though, the data doesn't seem consistent with what you're seeing. So I think there's something about your XML file that's different from my very naive one.

                          1 person found this helpful
                          • 10. Re: ESTK/INDESIGN XML Parsing limit ?
                            Loic.Aigon Adobe Community Professional

                            Hi John,

                             

                            It's also my conclusion, the conversion/initialization of the string hosting the xml content seems to fail.

                            Even if I request alert(myXML.read()), it fails. So maybe, there is performance issue here, to be confirmed.

                             

                            Anyway, thx a lot for all your generous efforts .

                             

                            Loic

                            • 11. Re: ESTK/INDESIGN XML Parsing limit ?
                              John Hawkinson Level 5

                              Loic:

                              It's also my conclusion, the conversion/initialization of the string hosting the xml content seems to fail.

                              Well, that was not quite what I meant. It is a big problem for concatenating many small strings, which my implementation of the benchmark does. But I don't think that should be happening when you call File.read().

                               

                              I just re-ran step 16 of my benchmark with a file created with a perl script the same way:

                              $ perl -e 'print "<Root>\n"; for ($i=0; $i<131071; $i++) { print qq|<boringtag attr="$i"/>\n| }; print "</Root>\n"' > step16.xml
                              

                               

                              It comes out like this:

                                  0          0        925         30          0
                              

                               


                              That is, it took 30ms to read in the file and 0.9 seconds to process it.

                              So I don't have any evidence to believe that it's about the string stuff in your case.

                               

                              Even if I request alert(myXML.read()), it fails. So maybe, there is performance issue here, to be confirmed.

                              Well, hang on. You should never alert() very large strings. That seems like a recipe for disaster. The user interface is just not designed for that.

                               

                              Hopefully this is helpful. Happy to help.

                              • 12. Re: ESTK/INDESIGN XML Parsing limit ?
                                Loic.Aigon Adobe Community Professional

                                Yeah I really appreciate your effort!

                                 

                                I will keep your advices about variables and alert. I used alert a/o $.writeln for quick view of what's going on. Anyway, even without alert, it seems that the script fails to read() the file content. XML(myXML.read()) is a non object. For example, I know there is a Table node and so TR childs but myXML.Table.TR[0] (or any other attemps to use the object) is undefined.

                                 

                                Well, I guess this is the end of the story. Even If it worked, 30mns long to proceed is unacceptable for the end-user.

                                 

                                A big thank you anyway for your interest.

                                 

                                Loic

                                • 13. Re: ESTK/INDESIGN XML Parsing limit ?
                                  John Hawkinson Level 5

                                  alert() and $.writeln() are fine for small things, not so fine for big things.

                                  What about saving myXML.read() into a string, then $.writeln-ing xmlstring.length?

                                  I am skeptical that the problem is the read(). That works fine for me and that should be independant of the structure of the file, just based on the length.

                                   

                                  Good luck!

                                  • 14. Re: ESTK/INDESIGN XML Parsing limit ?
                                    Loic.Aigon Adobe Community Professional

                                    Hey John,

                                     

                                    Is it the night who helped my computer to take some breath or some magical thing (or me doing right based on your advices) but it's working now. You are right I was presuming that I had an issue cause to alert/writeln unablme to display the xmlFile.read(); and it still does BUT if I access properties like you proposed :

                                     

                                    xmlFile.open('r');
                                    var xmlString = xmlFile.read();
                                    var myXML = XML( xmlString );
                                    myXML.children().length(); //1
                                    myXML.Table.TR.length(); //200
                                    

                                     

                                     

                                    So it looks like it's working finally. I don't know why I couldn't get access to the XML hierarchy yesterday :?

                                     

                                    Anyway, you really helped on this one , thx a lot !

                                     

                                    Loic