3 Replies Latest reply on Apr 18, 2016 2:59 AM by Mike-Hardy

    \b when getting text from text items. How to get just the plain text?

    Ch. Efstathios Level 1

      I am using FDK C++ and get TextItems with FTI_String from the active document. But the text from the text items contains some characters that do not appear in the actual text. For example \b. What is this character and how can I get only the plain text without any FrameMaker metadata characters.

        • 1. Re: \b when getting text from text items. How to get just the plain text?
          Russ Ward Level 4

          Ch,

           

          I think you might be seeing tabs. Submitted as an escape sequence, \b translates to the ASCII backspace character (0x08). For some reason, this is how tabs are represented when you get text from a document with the API. You would expect them to show as \t or ASCII 0x09, but they do not.

           

          There is no way to avoid retrieving a user-entered character when you do F_ApiGetText(). You simply have to do a search and replace once you have the string in your code.

           

          I hope I understand what you are asking here.

           

          Russ

          • 2. Re: \b when getting text from text items. How to get just the plain text?
            Ch. Efstathios Level 1

            Whenever the user presses the tab key a symbol appears in the document and when I retrieve the text I see \b. I will probably replace it with space or handle it in my application when showing the text in the view. Thank you so much.

            • 3. Re: \b when getting text from text items. How to get just the plain text?
              Mike-Hardy Level 3

              > For some reason, this is how tabs are represented when you get text from a document with the API.

               

              The reason is that FrameMaker's original Standard character set encoded a tab as \x08. A strange decision that has caught me out before.

               

              Here is that encoding's idiosyncratic use of the control-character space:

              Hex codeStandard character set
              \x04

              discretionary hyphen

              \x05suppress hyphenation
              \x08tab
              \x09

              forced return

              \x0aend of paragraph
              \x10numeric space
              \x11nonbreaking space
              \x12thin space
              \x13en space
              \x14em space
              \x15

              nonbreaking hyphen

               

              Some of the above might have changed with the introduction of Unicode support. For example, the various spaces might now be encoded with their correct Unicode value. But the tab weirdness appears to persist...