Copy link to clipboard
Copied
I am using FDK C++ and get TextItems with FTI_String from the active document. But the text from the text items contains some characters that do not appear in the actual text. For example \b. What is this character and how can I get only the plain text without any FrameMaker metadata characters.
Ch,
I think you might be seeing tabs. Submitted as an escape sequence, \b translates to the ASCII backspace character (0x08). For some reason, this is how tabs are represented when you get text from a document with the API. You would expect them to show as \t or ASCII 0x09, but they do not.
There is no way to avoid retrieving a user-entered character when you do F_ApiGetText(). You simply have to do a search and replace once you have the string in your code.
I hope I understand what you are asking
...Copy link to clipboard
Copied
Ch,
I think you might be seeing tabs. Submitted as an escape sequence, \b translates to the ASCII backspace character (0x08). For some reason, this is how tabs are represented when you get text from a document with the API. You would expect them to show as \t or ASCII 0x09, but they do not.
There is no way to avoid retrieving a user-entered character when you do F_ApiGetText(). You simply have to do a search and replace once you have the string in your code.
I hope I understand what you are asking here.
Russ
Copy link to clipboard
Copied
Whenever the user presses the tab key a symbol appears in the document and when I retrieve the text I see \b. I will probably replace it with space or handle it in my application when showing the text in the view. Thank you so much.
Copy link to clipboard
Copied
> For some reason, this is how tabs are represented when you get text from a document with the API.
The reason is that FrameMaker's original Standard character set encoded a tab as \x08. A strange decision that has caught me out before.
Here is that encoding's idiosyncratic use of the control-character space:
Hex code | Standard character set |
---|---|
\x04 | discretionary hyphen |
\x05 | suppress hyphenation |
\x08 | tab |
\x09 | forced return |
\x0a | end of paragraph |
\x10 | numeric space |
\x11 | nonbreaking space |
\x12 | thin space |
\x13 | en space |
\x14 | em space |
\x15 | nonbreaking hyphen |
Some of the above might have changed with the introduction of Unicode support. For example, the various spaces might now be encoded with their correct Unicode value. But the tab weirdness appears to persist...