12 Replies Latest reply on Dec 28, 2011 6:00 AM by John Hawkinson

    reading xml file

    TD_TAD

      i want to read this xml file and apply corresponding style to corresponding elements.

      if any body know solution of this problem please help me.(Indesign Javascript)

       

      my xml file containts are as follows,

      <italic><bold>hello World</bold>:</italic>

        • 1. Re: reading xml file
          Jim Kingston

          Use the tags mapped to paragraph styles

          • 2. Re: reading xml file
            John Hawkinson Level 5

            However, that doesn't work for the offered example, which features partially overlapping tags -- hello World is both Bold and Italic, yet : is only italic. Etc. Mumble mumble Cartesian product.

            • 3. Re: reading xml file
              [Jongware] Most Valuable Participant

              Sorry John but the OP's sample does not feature partially overlapping tags. Tags in XML ought to overlap in their entirety. The OP's sample does this. (Although it's confusing and inadvisable to use these names -- 'italic' doesn't really sound like a root element, and both tag names have historical connotations which are sure to muddy the discussion. Pick semantic names rather than literal descriptions. Train yourself to think Null-A.)

               

              What's also interesting is that John's "mumble mumble Cartesian product" interpretation differs from InDesign's actual implementation of character styles (or -- if this 'italic' tag is a Root element -- paragraph styles, with character styles on top of 'em).

               

              Both are equally valid interpretations (sorry again, John, I'm not going to allow you to defend yourself with "but HTML does it right" because this is XML!) and the obviously oblivious OP must pick one -- and one only -- before any body can help him.

              • 4. Re: reading xml file
                [Jongware] Most Valuable Participant

                [Jongware] wrote:

                Tags in XML ought to overlap in their entirety. The OP's sample does this.

                 

                Huh, by 'overlap' I meant they should nest. (They "overlap" only in the sense that one tag contains the other.) Paritally overlapping is not valid XML.

                • 5. Re: reading xml file
                  John Hawkinson Level 5

                  Jongware: You are correct that in this case I should have said "nested tags" instead of overlapping tags. This features nested tags that overlap semantically. While Tahir should indeed decide whether he wants HTML tag soup or XML rigid non-overlapping tags, I fear, based on the example, that the former is what is desired.

                   

                  In any case:

                  (Although it's confusing and inadvisable to use these names -- 'italic' doesn't really sound like a root element, and both tag names have historical connotations which are sure to muddy the discussion. Pick semantic names rather than literal descriptions. Train yourself to think Null-A.)

                  I don't think that's good advice here. I assume the markup comes from elsewhere, and the names are chosen for the desired transformations to the characters.

                   

                  What's also interesting is that John's "mumble mumble Cartesian product" interpretation differs from InDesign's actual implementation of character styles (or -- if this 'italic' tag is a Root element -- paragraph styles, with character styles on top of 'em).

                  I think we ought to be assuming that this is certainly not the entirety of the XML and that this is a run of text, as might appear in a <P> tag. In any case, I'm not sure whether you're agreeing with me that dealing with this is a poor fit with InDesign's style model, or agreeing.

                   

                  While you could, I suppose, define both Bold and Italic as both Paragraph and Character styles, it could get messy.

                  I suppose I don't know what the right solution is. XSLT to multiply out the cartesian products, maybe?

                   

                  Tahir: Please post a longer excerpt of the file.

                  • 6. Re: reading xml file
                    [Jongware] Most Valuable Participant

                    John Hawkinson wrote:

                     

                    [...] In any case:

                    (Although it's confusing and inadvisable to use these names -- 'italic' doesn't really sound like a root element, and both tag names have historical connotations which are sure to muddy the discussion. Pick semantic names rather than literal descriptions. Train yourself to think Null-A.)

                    I don't think that's good advice here. I assume the markup comes from elsewhere, and the names are chosen for the desired transformations to the characters.

                     

                    (my emphasis) That's my point exactly. If the "desired" outcome would be Bold plus Italic for the middle part, the tags should reflect that. Using semantically correct tags avoids this:

                     

                    <em><em2>hello world</em2>:</em>

                     

                    where <em> is emphasized, and <em2> is even more emphasized, or even this:

                     

                    <em><em>hello world</em>:</em>

                     

                    In this case you cannot run into the above dilemma whether to make the inner text italics or bold plus italics. You'd have to define what "emphasized" and "doubly emphasized" looks like; and this could very well depend on a higher level of information. For example, in a bold title, <em> translates to Bold Italic, whereas in plain text it would typically be plain Italics. In this case, applying dumb rules would make double emphasis get lost in the bold title.

                     

                    John Hawkinson wrote:

                     

                    I think we ought to be assuming that this is certainly not the entirety of the XML and that this is a run of text, as might appear in a <P> tag. In any case, I'm not sure whether you're agreeing with me that dealing with this is a poor fit with InDesign's style model, or agreeing.

                     

                    If the OP was poorly worded and this is not for XML but for HTML, this particular issue is only going to be the very first problem of many more As Kris at Rorohiko states:

                     

                    FramedWeb consists of an HTML parser and a CSS parser – it will attempt to parse the HTML text and CSS styles, and convert them to something vaguely similar in InDesign

                    (FramedWeb)

                     

                    If it is XML, there is some hope yet:

                     

                    I suppose I don't know what the right solution is. XSLT to multiply out the cartesian products, maybe?

                     

                    That's certainly possible, as it forces you to think ahead about what the tag <italic> may translate to when it's inside a tag that already has Italics as a default attribute.

                    • 7. Re: reading xml file
                      John Hawkinson Level 5

                      (my emphasis) That's my point exactly. If the "desired" outcome would be Bold plus Italic for the middle part, the tags should reflect that.

                      I don't think so at all.

                      If someone hands me an HTML-formatted file with <bold> and <italic> tags, it is a waste of my time to change the names of the tags to something that you suggest is more semantically appropriate.

                       

                      (Furthermore, I'm not sure what you would change them [b]to[/b]! <emph> and <strong>? What's the point?)

                      • 8. Re: reading xml file
                        [Jongware] Most Valuable Participant

                        But you are talking about HTML! Not exactly defined very well, although there is a number of conventions most/all browsers follow, such as this continuing example of italics-inside-bold.

                        Which leads me to think: some standard headings in HTML come formatted in bold, by default. What happens then if you ask for even more [Bold]? (You could try it and see, or you could try to locate what W3C says about this, whatever is fastest.)

                         

                        Of course you are going to get in trouble if someone else, less knowledgeable than you, sends you a file with those persky Bold and Italic tags. In that case it's left to you to decide what to do with these troublesome issues.

                         

                        With semantic tagging, you could decide emphasis-inside-italics gets underlined instead. Suppose you do so, you might want to translate <italic> inside an italic header to <u>, and XSLT or a smart Javascript could do that for you. But then you are treating the 'command' <italic> as if it were <emph>, without paying attention to the actual request for italics.

                         

                        ----

                         

                        All this is just rhetoric on my side, really. I'm not trying to convince you of anything. It's just that the post is so vague, it's important to at least be aware of the various pitfalls (rhetorically, semantically, but most of all practically).The OP's request is too short, too vague, and badly worded.

                         

                        Besides that, apart from discussing possible problems, I don't feel like helping out by writing a full XML (or HTML ) parser for him, there are commercial tools for that, and if not, there are professional script writers that actually want to get paid for a big job like this.

                        • 9. Re: reading xml file
                          [Jongware] Most Valuable Participant

                          Never been one to let things rest -- here is an image of <h1>Text</h1>, with <i> (italics) in the first line and <b> (bold) in the second. The italics in the first line gets translated to Bold Italic, which is only reasonable within the expectations of an HTML browser, the bold in the second line is lost.

                           

                          html-nest.PNG

                           

                          Admittedly, this is with Internet Explorer 8 and so it can be argued that "any other browser would get it right" -- then again, what is "right" in this case?

                          If [Bold]-inside-[Bold] only applies the inner tag (which is, just to check you are paying attention, 'bold'), then by the same token the text [Italics] should be italics and not bold plus italics.

                           

                          Reasoning that this is 'expected behavior' is in my opinion not valid. One would expect the extra boldness to be reflected in the output, irregardless of the surounding "parent" formatting.

                           

                          But I digress. The issue is not about HTML (at least, I do hope so), but 'merely' on how to handle nested tags which apply conflicting formatting. "Adding up" formatting can not be done with a simple search-and-replace, it needs awareness of the surrounding level; and "smart adding up", i.e., making Italics into Bold italics, needs awareness of its formatting.

                           

                          I think I'm beginning to understand why the latest FramedWeb beta is numbered 0.0.7 and is a "work in progress" rather than the OP's hopeful "solution".

                          • 10. Re: reading xml file
                            Ian Proudfoot Level 3

                            In my experience this type of nested, style based XML markup is quite common. I normally work with industry standard DTDs and schemas where you don't get the option to change the markup, but you shouldn't need to. In both XML and HTML the accepted way to apply formatting is by using a cascading stylesheet of one type or another. It may be CSS or perhaps a proprietary version such as an EDD in FrameMaker. As mentioned previously cascading styles are not an option with InDesign.

                             

                            I have had to make the cascading for character formatting work in InDesign. It's far more difficult than it should be, but with a bit of work the results can be acceptable. I use ExtendScript to dynamically create styles that effectively merge the formatting properties of two or more styles into just one then apply that to the lowest level nested element. The alternative static Cartesian product method would be far too difficult to maintain.

                             

                            Ian

                            • 11. Re: reading xml file
                              TD_TAD Level 1

                              thanks a lot for u r suggestions.

                              • 12. Re: reading xml file
                                John Hawkinson Level 5

                                I don't think you were being sarcastic, but maybe you should have been.

                                This seems like it should be a fairly simple problem, but we have not offered any simple solutions.

                                I feel like there should be some.

                                 

                                I feel like I'm just not thinking about the problem properly.

                                 

                                Anyone?