-
1. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 2, 2011 12:56 AM (in response to Loic_aigon)How big is big?
Does it scale with bytes or nodes or depth?
If you use an intermediate string variable does it change anything?
Are you targetting the ESTK or InDesign? Does behavior change?
CS5?
Theoretically you could build the XML object incrementally...
-
2. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 2, 2011 1:24 AM (in response to John Hawkinson)Hi John,
The file isn't that heavy as it's only 95ko but has 2200 lines.
That's an Excel sheet exported to XML.
I tried to execute within ESTK pointing to ESTK or INDESIGN and none of these could help.
It's a xml exported from a Excel Document (Excel 2003 XML).
I tried the temporary string without success. I could of course manipulate that string to reduce the content of the XML object but it's not an ideal solution and not generic neither :S
Hope there is a way.
Loic
-
3. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 2, 2011 5:09 AM (in response to Loic_aigon)I think you should be more specific about what kind of XML you are trying to parse.
I ran some casual benchmarks and the time looks linear with the number of elements (column 3). Note that the time it takes to generate an array and convert it to a string (columns 4 and 5) are much worse than the time to instantiate the E4X XML object. This is targetting InDesign CS5 Mac:
0 1 0 0 0 1 3 0 0 0 2 7 1 0 0 3 15 0 0 0 4 31 1 0 0 5 63 1 0 0 6 127 2 1 0 7 255 2 2 0 8 511 4 4 0 9 1023 8 6 2 10 2047 17 46 5 11 4095 35 53 14 12 8191 68 196 55 13 16383 136 431 195 14 32767 278 1082 726 15 65535 583 5806 20924 16 131071 1157 37870 90983
If instead I target the ESTK its quite similar:
0 1 0 0 0 0 1 0 8 0 1 3 0 0 0 2 7 0 1 0 3 15 0 0 0 4 31 0 0 0 5 63 1 0 0 6 127 1 1 0 7 255 2 2 0 8 511 4 2 1 9 1023 9 6 1 10 2047 16 14 5 11 4095 35 81 16 12 8191 72 124 61 13 16383 143 363 220 14 32767 290 1167 806 15 65535 575 7911 24248 16 131071 1197 51845 109387
though after that things get fairly slow, I assume (naively!) because of the array and string creation.
And the code:
(function() { var i, j, n, xmlarr =[], count=0, xmlstring, xml, times = []; function pad(s, n) { var m, str=s.toString(); m = n-str.length; if (m>0) { return (new Array(m+1).join(' '))+str; } else { return str; } } i =0; for (n=1; n<1e6; n=2*n, i++) { //xmlarr = []; times = []; times.push(new Date().valueOf()); for (j=n; j>0; j--) { xmlarr.push('<boringtag attr="'+j+'"/>'); } count+=n; times.push(new Date().valueOf()); xmlstring = "<Root>"+xmlarr.join('')+"</Root>"; times.push(new Date().valueOf()); //~ xmlstring = "<Root>"+ //~ (new Array(n).join('<boringtag attr="'+n+'"/>'))+ //~ "</Root>"; xml=new XML(xmlstring); times.push(new Date().valueOf()); $.writeln(pad(i,5)+" "+pad(count,10)+" "+ pad(times[3]-times[2],10)+" "+ pad(times[1]-times[0],10)+" "+ pad(times[2]-times[1],10)); } }()); -
4. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 2, 2011 5:55 AM (in response to John Hawkinson)Thanks for your interest and help,
I am trying to parse a XML file generated from Acrobat which was a excel document at the origin. I do this as Acrobat export tables infos much more cleaner than Excel could do natively.
My point is then to recreate several Indesign Tables. I required authorization for sending the XML file as I am not the author. As soon as I get the ok, I will link it here.
Anyway, here is the code I wrote :table2XML(); function table2XML(){ var myXML = File.openDialog();//File(Folder.desktop+"/aTable.xml"), //.openDialog(), tables, outXML, declaration, xmlBody, doc; if(!myXML || myXML.name.match(/\.xml$/i)==null){ return false; } var xmlParser=(function(/*File*/xmlFile) { return { init: function() { xmlFile.open('r'); var myContent = return XML( xmlFile.read() ); }, getMaxChild: function(node) { var nodeMax=node.length(), maxChild=0, i; for(i=0; i<nodeMax; i+=1){ nodeChildLength=node[i].children().length(); if(nodeChildLength>maxChild){ maxChild = nodeChildLength } } return maxChild; }, createXMLTables:function(xml) { var tables=xml.Table, tMax=tables.length(), rowCount, columnCount, tablesArr=[]; for(var i=0; i<tMax; i+=1){ var tbl = tables[i], rows=tbl.TR, rowsMax=rows.length(), rowCount=tbl.TR.length(), columnCount=xmlParser.getMaxChild(tbl.TR), tblArr=["<Article><Table xmlns:aid=\"http://ns.adobe.com/AdobeInDesign/4.0/\" aid:table=\"table\" aid:trows=\""+rowCount+"\" aid:tcols=\""+columnCount+"\">"]; //"]; for(var j=0; j<rowsMax; j+=1){ var childs=rows[j].children(), ck; for(var k=0; k<columnCount; k+=1){ ck=childs[k]; if(ck===undefined) ck=""; tblArr.push("<Cellule aid:table=\"cell\" aid:crows=\"1\" aid:ccols=\"1\" aid:ccolwidth=\"161.5564304461942\">"+ck+"</Cellule>"); } } tblArr.push("</Table></Article>"); tablesArr.push(tblArr.join("")); } return tablesArr; } }; })(myXML); myXML = xmlParser.init(); alert(myXML); /* //Assembling the ouput XML File declaration = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>"; xmlBody = xmlParser.createXMLTables(myXML)//.toString()//.replace(/(<Cellule )xmlns:aid="http:\/\/ns.adobe.com\/AdobeInDesign\/4.0\/\"/g, "$1"); outXML = new File(Folder.desktop+"/testXML.xml"); outXML.open('w'); outXML.write(declaration+"\r<Root>"+xmlBody.join("\r")+"\r</Root>"); outXML.close(); if(app.documents.length==0){ app.documents.add(); } doc=app.activeDocument; doc.importXML(outXML); */ }As I said, with small xml files, it works like a charm:
But on this 2200+ lines XML, it seems like the XML object isn't created and so is the alert :
And I can't access any child of teh XML element so I guess, it's not only a matter of bug display :S
I hope I can post the source XML quick.
Loic
-
5. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 2, 2011 6:29 AM (in response to Loic_aigon)Oh my goodness. That's a bit confusing to follow.
At the start of table2XML(), you initialize myXML to be a File object.
Then you call declare xmlParser as an Object with methods that are bound to myXML (as xmlFile).
Then you assign the result of xmlParser.init() to myXML.
So myXML starts out as a file object, and then changes into a XML object? Having the types of variables change can be very confusing to read!
Is the problem, then, that xmlParser.init() fails to return?(It also seems like alert(myXML) is a very bad idea if is very large. But I'm not quite sure how the XML.toString() works, I think I've sometimes seen some very long things and sometimes seen nothing.)
Anyhow, I wasn't expecting you to post your actual confidential data. But I assumed you could easily build a test case that was similar enough to it to cause the problem. I must admit, my comments on the benchmark may have been misleading. I expected line #17 to have printed a few minutes after line #16, and here we are three hours later [Edit: 1.5 hours later] and ID is still taking 100% of one core on this 2.8 GHz quad core Xeon. But it still could be the string and array operations...
Anyhow, it does not look good for this approach...
-
6. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 2, 2011 6:33 AM (in response to John Hawkinson)Hi John,
I am switching myXML from File object to XML object in order to manipulate more easily the content than with a string.
I use xmlParser to create a toolbox for my XML Object.
Alert is only for debugging to see if something happens.
Unfortunately, I wasn't allowed to expose the xml but was given another one so i am trying to see if it's causing the same issue and come back here later.
Maybe I use strings but i find it's pity to loose every E4X advantages :\
Thanks for all,
Loic
-
7. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 2, 2011 6:37 AM (in response to Loic_aigon)The code would be a lot clearer if you kept the File object in one variable and the XML in another.
E4X is no good if it doesn't work. You could always try the InDesign DOM's XML support. Though I imagine that is worse.
-
8. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 2, 2011 7:07 AM (in response to John Hawkinson)You are right. Being stubborn is helpless :\
Thanks anyway for all your support
loic
-
9. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 2, 2011 12:31 PM (in response to John Hawkinson)Well, somewhere after n=16, my InDesign process went south and so did my ESTK. Coming back several hours ago, I have an newly-restarted ESTK that reports in the console (targetting InDesign):
19 1048575 9518 8341821 7448177
So, it took 9 seconds to create the XML object with 1.1 million nodes. Sadly, it also took 4.3 hours to generate the string to feed to the XML object. Which is a good reminder that neither arrays nor strings are terribly efficient in JavaScript, and this kind of bnechmark in the future should generate the strings some other way, like a perl or python script.
Anyhow, though, the data doesn't seem consistent with what you're seeing. So I think there's something about your XML file that's different from my very naive one.
-
10. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 2, 2011 2:12 PM (in response to John Hawkinson)Hi John,
It's also my conclusion, the conversion/initialization of the string hosting the xml content seems to fail.
Even if I request alert(myXML.read()), it fails. So maybe, there is performance issue here, to be confirmed.
Anyway, thx a lot for all your generous efforts .
Loic
-
11. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 2, 2011 2:36 PM (in response to Loic_aigon)Loic:
It's also my conclusion, the conversion/initialization of the string hosting the xml content seems to fail.
Well, that was not quite what I meant. It is a big problem for concatenating many small strings, which my implementation of the benchmark does. But I don't think that should be happening when you call File.read().
I just re-ran step 16 of my benchmark with a file created with a perl script the same way:
$ perl -e 'print "<Root>\n"; for ($i=0; $i<131071; $i++) { print qq|<boringtag attr="$i"/>\n| }; print "</Root>\n"' > step16.xmlIt comes out like this:
0 0 925 30 0
That is, it took 30ms to read in the file and 0.9 seconds to process it.So I don't have any evidence to believe that it's about the string stuff in your case.
Even if I request alert(myXML.read()), it fails. So maybe, there is performance issue here, to be confirmed.
Well, hang on. You should never alert() very large strings. That seems like a recipe for disaster. The user interface is just not designed for that.
Hopefully this is helpful. Happy to help.
-
12. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 2, 2011 11:58 PM (in response to John Hawkinson)Yeah I really appreciate your effort!
I will keep your advices about variables and alert. I used alert a/o $.writeln for quick view of what's going on. Anyway, even without alert, it seems that the script fails to read() the file content. XML(myXML.read()) is a non object. For example, I know there is a Table node and so TR childs but myXML.Table.TR[0] (or any other attemps to use the object) is undefined.
Well, I guess this is the end of the story. Even If it worked, 30mns long to proceed is unacceptable for the end-user.
A big thank you anyway for your interest.
Loic
-
13. Re: ESTK/INDESIGN XML Parsing limit ?
John Hawkinson Mar 3, 2011 1:10 AM (in response to Loic_aigon)alert() and $.writeln() are fine for small things, not so fine for big things.
What about saving myXML.read() into a string, then $.writeln-ing xmlstring.length?
I am skeptical that the problem is the read(). That works fine for me and that should be independant of the structure of the file, just based on the length.
Good luck!
-
14. Re: ESTK/INDESIGN XML Parsing limit ?
Loic_aigon Mar 3, 2011 1:42 AM (in response to John Hawkinson)Hey John,
Is it the night who helped my computer to take some breath or some magical thing (or me doing right based on your advices) but it's working now. You are right I was presuming that I had an issue cause to alert/writeln unablme to display the xmlFile.read(); and it still does BUT if I access properties like you proposed :
xmlFile.open('r'); var xmlString = xmlFile.read(); var myXML = XML( xmlString ); myXML.children().length(); //1 myXML.Table.TR.length(); //200So it looks like it's working finally. I don't know why I couldn't get access to the XML hierarchy yesterday :?
Anyway, you really helped on this one , thx a lot !
Loic




