• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

\b when getting text from text items. How to get just the plain text?

Participant ,
Apr 15, 2016 Apr 15, 2016

Copy link to clipboard

Copied

I am using FDK C++ and get TextItems with FTI_String from the active document. But the text from the text items contains some characters that do not appear in the actual text. For example \b. What is this character and how can I get only the plain text without any FrameMaker metadata characters.

TOPICS
Scripting

Views

512

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Mentor , Apr 15, 2016 Apr 15, 2016

Ch,

I think you might be seeing tabs. Submitted as an escape sequence, \b translates to the ASCII backspace character (0x08). For some reason, this is how tabs are represented when you get text from a document with the API. You would expect them to show as \t or ASCII 0x09, but they do not.

There is no way to avoid retrieving a user-entered character when you do F_ApiGetText(). You simply have to do a search and replace once you have the string in your code.

I hope I understand what you are asking

...

Votes

Translate

Translate
Mentor ,
Apr 15, 2016 Apr 15, 2016

Copy link to clipboard

Copied

Ch,

I think you might be seeing tabs. Submitted as an escape sequence, \b translates to the ASCII backspace character (0x08). For some reason, this is how tabs are represented when you get text from a document with the API. You would expect them to show as \t or ASCII 0x09, but they do not.

There is no way to avoid retrieving a user-entered character when you do F_ApiGetText(). You simply have to do a search and replace once you have the string in your code.

I hope I understand what you are asking here.

Russ

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Apr 15, 2016 Apr 15, 2016

Copy link to clipboard

Copied

Whenever the user presses the tab key a symbol appears in the document and when I retrieve the text I see \b. I will probably replace it with space or handle it in my application when showing the text in the view. Thank you so much.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Apr 18, 2016 Apr 18, 2016

Copy link to clipboard

Copied

LATEST

> For some reason, this is how tabs are represented when you get text from a document with the API.

The reason is that FrameMaker's original Standard character set encoded a tab as \x08. A strange decision that has caught me out before.

Here is that encoding's idiosyncratic use of the control-character space:

Hex codeStandard character set
\x04

discretionary hyphen

\x05suppress hyphenation
\x08tab
\x09

forced return

\x0aend of paragraph
\x10numeric space
\x11nonbreaking space
\x12thin space
\x13en space
\x14em space
\x15

nonbreaking hyphen

Some of the above might have changed with the introduction of Unicode support. For example, the various spaces might now be encoded with their correct Unicode value. But the tab weirdness appears to persist...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines