11 Replies Latest reply on Mar 4, 2016 3:30 PM by frameexpert

    Coding of special characters in tagged text (Unicode)

    Ingvyn

      Has anyone faced the problem of placement in the tagged text with special characters. Help identify, for example, this method of encoding: the Nonbreaking space (fixed width) <0x202F>. But when you try to use this method of character encoding in the text with Unicode coding (with <UNICODE-WIN>) Nonbreaking space is not imported, and the imported sequence <0x202F>. Perhaps there is another way of encoding special characters for placing them in the tagged text in Unicode?

        • 1. Re: Coding of special characters in tagged text (Unicode)
          [Jongware] Most Valuable Participant

          The problem (one of the many problems with tagged text ) is that if you say up front that the file is "Unicode", InDesign immediately stops recognizing the <0xXXXX> codes. If you change the tagging to something else, for example "<ASCII-WIN>", then they work again - they just don't work in combination with "UNICODE".

           

          The best way to insert these characters, then, is to insert them as the correct character right away. That may be why Adobe's engineers shrugged it off: "if you can write out proper Unicode, then there is no need to use these codes".[*]

          Second best is to replace the sequence in InDesign (with a script you can search for all occurrences of "<0x....>" and change them all at once).

           

          [*] With which they would have been wrong. It depends on where your text came from, but I often find it easier to insert a <0x...> code rather than hunt down how to insert exactly that character into my text, and having to somehow make sure it's that one that got saved, and thus imported into InDesign.

          And it would have been a good solution for the annoying Tagged Text bug (/"feature") that makes it impossible to add Force Line Breaks (code 0x000A) into a document created on a Mac (where the hard returns also get the code 0x000A).

          • 2. Re: Coding of special characters in tagged text (Unicode)
            frameexpert Level 4

            I am actually a big fan of InDesign Tagged Text, especially for some XML to InDesign workflows. I typically use XSLT to produce Tagged Text from the XML and then import the Tagged Text into my InDesign layout. It tends to be faster than native XML import and I like the fact that you can "reverse engineer" the required Tagged Text by exporting a sample from your InDesign content. I recently did a webinar on using this process to generate complex InDesign tables of contents (InDesign Table of Contents Webinar – Thursday, January 28, 2016, 1:00 pm EST – FrameAutomation.com). I have heard that Tagged Text is buggy and has problems but so far I haven't found any showstoppers.

            • 3. Re: Coding of special characters in tagged text (Unicode)
              [Jongware] Most Valuable Participant

              Rick, I indeed have to do the same thing. No head-on showstoppers so far – but only because I fix the remaining issues through a post-import script

               

              From memory: inserting a soft line break is a problem, tables is rather difficult (as InDesign's model differs from that of the input, which is based on HTML), footnote settings confuse the formatter, and – absolutely the worst thing ever – it is impossible to create and insert hyperlinks. Their definitions should appear at the very end of the tagged text, and the position to be placed at is the "offset" in plain characters from the start of the text. I refused flat-out to even try and code that in my XSLT.

               

              (The import module itself is rather buggy too, it must have been written by a trainee. Try scrolling a list of errors when there are more than a (dialog) screenful. What is happening behind the screens!? Is the list updated by importing the file again and again, or what!!?)

              • 4. Re: Coding of special characters in tagged text (Unicode)
                Vamitul Level 4

                Rick, Jongware, why not use icml instead of tagged text? With icml tables almost make sense, hyperlinks are a breeze, footnote formatting works (but the separator characters are painful), linebreaks behave as they should, special characters also work etc.

                It is quite a bit more work to get it set up initially, but it pays of in the end. Or at least it did for me.

                • 5. Re: Coding of special characters in tagged text (Unicode)
                  frameexpert Level 4

                  I was able to get hyperlinks to work inline; here is a sample entry:

                   

                  <ParaStyle:IndexSectionHead>A

                  <ParaStyle:IndexLevel1>A. G. Upham <CharStyle:Hyperlink><Hyperlink:=<HyperlinkName:32><HyperlinkDest:a-g-upham_6342><Hyperli nkDestFile:C\:\\Users\\Carmen\\Dropbox\\RBHT \(shared Jonathan\)\\RBHT Indesign formatted\\RBHT\_03\_1\_Thessalonians\_chap\_2.indd><CharStyleRef:Hyperlink><HyperlinkLen gth:2><HyperlinkStartOffset:0><Hidden:0><BrdrVisible:0><BrdrWidth:Thin><BrdrHilight:None>< BrdrStyle:Solid>>32<CharStyle:>

                   

                  InDesign kept crashing when I imported the file, but I figured out that I had to have all of the target files open in order to prevent the crash.

                   

                  Vamitul, For many jobs I would agree with you, but with simple jobs I prefer the ease of Tagged Text.

                  • 6. Re: Coding of special characters in tagged text (Unicode)
                    Ingvyn Level 1

                    Thank you. Question resolved on the same day, as suggested by the engineers Adobe: I typed in the location nbsp Alt+0160, then Alt+0151 instead of mdash and the text was imported as it should. Apparently, this issue can be solved by software tools.

                    • 7. Re: Coding of special characters in tagged text (Unicode)
                      Ingvyn Level 1

                      About icml's very interesting. I looked at the contents in the exported icml file. The text is quite simple. Going to understand how organized hyperlinks. But to examine the entire model I thought difficult. There is the color models and descriptions of the frames, etc. etc.

                      • 8. Re: Coding of special characters in tagged text (Unicode)
                        frameexpert Level 4

                        How do you get an ICML file? Do these come from InCopy? I can get an IDML file from InDesign. Can an ICML file be imported into InDesign or just InCopy? If ICML is simpler than IDML I may take a look at it. Thanks.

                        • 9. Re: Coding of special characters in tagged text (Unicode)
                          Vamitul Level 4

                          Rick, icml is a subset of the idml, mostly used to describe just a story. You can import/export one from indesign just like you would a tagged text. And one of the best things about it is that a lot of the stuff in an icml is optional (indesign will export a huge file with a lot of tags and settings, but almost all of that is unnecessary).

                          Here is an example of a relatively simple icml that i normally use, but it can be made a lot simpler yet:

                          <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
                          <?aid style="50" type="snippet" readerVersion="6.0" featureSet="257" product="8.0(370)" ?>
                          <?aid SnippetType="InCopyInterchange"?>
                          <Document DOMVersion="8.0" Self="d">
                            <RootCharacterStyleGroup>
                            <CharacterStyle Name="$ID/[No character style]" Self="CharStyleNone"/>
                            </RootCharacterStyleGroup>
                            <RootParagraphStyleGroup>
                            <ParagraphStyle Name="$ID/[No paragraph style]" Self="ParaStyleNone"/>
                            <ParagraphStyle Name="HEADINGS:Heading_1" Self="ParagraphStyle/HEADINGS%3aHeading_1"/>
                            <ParagraphStyle Name="BODY:Body_text" Self="ParagraphStyle/BODY%3aBody_text"/>
                            </RootParagraphStyleGroup>
                            <Story Self="main">
                            <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/HEADINGS%3aHeading_1">
                            <CharacterStyleRange AppliedCharacterStyle="CharStyleNone">
                            <Content>This is a chapter head.</Content>
                            </CharacterStyleRange>
                            <Br/>
                            </ParagraphStyleRange>
                            <ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/BODY%3aBody_text">
                            <CharacterStyleRange AppliedCharacterStyle="CharStyleNone">
                            <Content>Blah blah</Content>
                            </CharacterStyleRange>
                            <CharacterStyleRange AppliedCharacterStyle="CharStyleNone">
                            <Content/>
                            </CharacterStyleRange>
                            <Br/>
                            </ParagraphStyleRange>
                            </Story>
                          </Document>
                          
                          • 10. Re: Coding of special characters in tagged text (Unicode)
                            frameexpert Level 4

                            Thanks for the details and sample code. Using a single file should be much easier than building an IDML package. I will give it a try and look up the spec. Thanks again. -Rick

                            • 11. Re: Coding of special characters in tagged text (Unicode)
                              frameexpert Level 4

                              I just saved your file to disk and imported it into InDesign CC. It imported fine. It should be more robust than tagged text and if it can be reverse-engineered by exporting from InDesign or InCopy then it has that advantage too. Thanks for the suggestion and sample code.