3 Replies Latest reply on Nov 30, 2011 10:49 AM by absqua

    Multiple Return to Single Return is Removing XML Tag

    aanavaras

      Dear All,

       

      I'm Using  -  InDesignCS3 + WinXP + VB6.

       

      I've a InDesign file which is created by XML Import (Our Customers needs XML Output also at the End).

      After importing i've done few modifications (Moved Figs & Captions to frames by PlaceXML method).

       

      At this stage an Extra Paragraph Return (where the Figure placed while import, before moving to frame) is available.

      When i tried to remove this by replace, XML Tag for the Figures are also deleted from the XML Structure of the Document.

       

      Is there any other way to do this? - Please Suggest

       

      Thanks in Advance,

      SaRaVaNaN.N...

        • 1. Re: Multiple Return to Single Return is Removing XML Tag
          absqua Level 4

          It's hard to know what's going on without seeing your document and script, but here are a couple of ideas for troubleshooting:

           

          1. View the relevant story in the story editor, with invisible characters and tag markers showing,  before and after you remove the extra return. Looking at tagged text elements in the story editor is for me the easiest way to get a handle on which characters are contained in which tags.

           

          2. If you're removing the extra return by searching for two returns and replacing them with one, you could be removing an empty tag between them. (I don't know what harm removing an empty tag would do, but...) Find ignores the 0xFEFF characters that hold the tags. Maybe instead try a grep find and replace, searching for a return preceded by a return ((?<=\r)\r in the UI; I don't know VB) and replacing that with nothing.

           

          Hope this helps.

           

          Jeff

          • 2. Re: Multiple Return to Single Return is Removing XML Tag
            aanavaras Level 1

            Hai,

             

            Here is a Simple Example for my Question.

             

            Ex: Import the following XML and from the XML Structure (Ctrl+Alt+1) Move <figure> to a new Frame.

                  You'll see an extra paragraph return between those two paragraphs.

                      <root><para>...text 1...</para>

                      <figure><title>fig.1</title><img href="myFig1.jpg"/><caption>about the figure</caption></figure>

                      <para>...text 2...</para></root>

             

            GoTo Story Editor (Ctrl+Y). You'll see as follows:

                      [x]para>...text 1...<para[x]

                      [x]figure]

                      [x]para>...text 2...<para[x]

             

            After Replacing '^b^b' to '^b' it looks like below:

                      [x]para>...text 1...<para[x]

                      [x]para>...text 2...<para[x]

             

            Figure Tag is removed from the XML Structure and Story Editor. Is there any way to remove this extra pragraph reture with keeping the XML Structure?

             

            Thanks...

            • 3. Re: Multiple Return to Single Return is Removing XML Tag
              absqua Level 4

              Did you try my suggestion of using a grep find, looking only for a return preceded by another  return? It seems to me that that should work for you.

               

              The issue as I understand it is this: your second paragraph contains only the figure tag marker (which is held by a zero-width non-breaking space—unicode 0xfeff). Find  ignores the 0xfeff character, so when you search for ^b^b, you're actually matching a three-character string: the return at the end of the preceding paragraph, the 0xfeff tag character, and the return at the end of the paragraph. Replacing that with one return wipes out the tag character and thereby the element it holds. The solution is to match only the second return and replace that with nothing. You can do this with the grep query (?<=\r)\r, which matches a return preceded by a return. (Because the 0xfeff tag character is ignored, it won't interfere.)

               

              Note that if you had any whitespace around the figure tag character, this query would not work. I think in that case, because the 0xfeff character can't be matched, you would need two queries to get rid of your extra line without removing the tag.