• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

PDTextSelect Object not Getting Special Characters like (≤ Ω Β ∞ ≠ ≥).

Community Beginner ,
Jul 19, 2018 Jul 19, 2018

Copy link to clipboard

Copied

When i extracted selected text, some special characters like (≤ Ω Β ∞ ≠ ≥) are displayed as junk character.

Please suggest me, how to Get all characters.

Thanks.

TOPICS
Acrobat SDK and JavaScript

Views

988

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jul 20, 2018 Jul 20, 2018

Copy link to clipboard

Copied

Can you post the actual code fragment you are using?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 20, 2018 Jul 20, 2018

Copy link to clipboard

Copied

TextSelect = AVPageViewTrackText(pageView, xHit, yHit, NULL);

PDDoc pdDoc = AVDocGetPDDoc(AVAppGetActiveDoc());

PDPage pdPage = AVPageViewGetPage(pageView);

int iPage = PDPageGetNumber(pdPage);

BKMCRot = PDPageGetRotate(pdPage);

if (TextSelect != NULL)

{

PDTextSelectEnumText(TextSelect, ASCallbackCreateProto(PDTextSelectEnumTextProc, BKMCTextEnumProc), NULL);

PDTextSelectEnumQuads(TextSelect, ASCallbackCreateProto(PDTextSelectEnumQuadProc, BKMCTextEnumQuadProc), NULL);

AVPageViewHighlightText(pageView, TextSelect);

ASBool bselection = AVDocSetSelection(AVAppGetActiveDoc(), ASAtomFromString("BMCreatorText"), TextSelect, true);

}

ACCB1 ASBool ACCB2  BKMCTextEnumProc(void* procObj, PDFont pdFont, ASFixed size, PDColorValue Color, char *buff, ASInt32 asLen)

{

//for getting Font Size

int iFontSize = FixedRoundToInt16(size);

csFontSize.Format(L"%d", iFontSize);

//For Getting Text Color

long Textcolor = CPDFLink::GetRGBFromPDColor(*Color);

long lRValue, lGValue, lBValue;

CString csRVal, csGVal, csBVal;

CColor::COLORREFToRGB(Textcolor, lRValue, lGValue, lBValue);

csRVal.Format(L"%d", lRValue);

csGVal.Format(L"%d", lGValue);

csBVal.Format(L"%d", lBValue);

csColorValue = (L"R=") + csRVal + (" G=") + csGVal + (" B=") + csBVal;

//for Getting Font name

char fontNameBuf[PSNAMESIZE];

PDFontGetName(pdFont, fontNameBuf, PSNAMESIZE);

csFontname = (CString)fontNameBuf;

//For multiple words we need to add each time.

CString csChar;

for (int iIndex = 0; iIndex < asLen; iIndex++)

{

char cBuff = buff[iIndex];

if (cBuff != 13 && cBuff != 10)

{

csChar = cBuff;

csBKMCKeyword += csChar;

}

}

buff = "";

return true;

}

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jul 20, 2018 Jul 20, 2018

Copy link to clipboard

Copied

Two things…

1 – Use PDTextSelectEnumTextUCS as that will return UCS (aka Unicode) encoded information so that you will be sure to get all text in a standardized fashion

2 – We careful with CString as (IIRC) it’s not great for arbitrary encodings.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 23, 2018 Jul 23, 2018

Copy link to clipboard

Copied

Thanks for the reply...

how to extract special characters using PDWordFinder object?

When i extracting "≤ Ω Β ∞ ≠ ≥" it display as  " = O . 8 . = " .

i am using the following code...

ACCB1 ASBool ACCB2 SearchTextBasedonSelectedFont(PDWordFinder wObj, PDWord wInfo, ASInt32 pgNum, void* clientData)

{

CString csText;// = "";

CString csColor, csBlueText;

COLORREF wordColor = NULL;

char buf[256];

bool NonAlphaNum = false, LeadingPunc = false, LeadingSpace = false;

ASInt32 liStyleIndex;

bool bcolorValue = false;

static int FirstOccurence = 0;

PDStyle pdWordStyle;

PDColorValueRec color;

color.space = PDDeviceRGB;

color.value[0] = color.value[1] = color.value[2] = color.value[3] = fixedZero;

liStyleIndex = 0;

ASFixedQuad quad;

ASInt16 llAttr, liNumQuads;

long llCurSequence;

llCurSequence = ++(*(long*)clientData);

try

{

//Get the word color

if ((pdWordStyle = PDWordGetNthCharStyle(wObj, wInfo, liStyleIndex)) != NULL)

{

PDStyleGetColor(pdWordStyle, &color);

}

// To get the word in buffer

PDWordGetString(wInfo, buf, 256);

csText = buf;

PDStyleGetFont(pdWordStyle);

PDStyle aoPDWordStyle = PDWordGetNthCharStyle(wObj, wInfo, 0);

liNumQuads = PDWordGetNumQuads(wInfo);

}
}

Thanks..

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 24, 2018 Jul 24, 2018

Copy link to clipboard

Copied

If you use the UCS word finder you wull get a Unicode word string. This must be treated as an array of WCHAR. You NEED to Understand Unicode encoding.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 24, 2018 Jul 24, 2018

Copy link to clipboard

Copied

Thank you for the reply...

I will Looking Unicode Encoding Concept...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 25, 2018 Jul 25, 2018

Copy link to clipboard

Copied

Hi Test Screen,

I am using PDDocCreateWordFinderUCS for Word Finder then

then i Get AsText from each PDWord.

then i Encode AsText to get all characters.

is it the correct way i am going?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 25, 2018 Jul 25, 2018

Copy link to clipboard

Copied

Does it give the expected answer?

If not, what encoding do you choose when you convert the ASText?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 25, 2018 Jul 25, 2018

Copy link to clipboard

Copied

it shows some other format...

i used the following code to encode ASText.

ASText nextWordASText = ASTextNew();

PDWordGetASText(nextWord, 0, nextWordASText);

CString TextObjectTxt = ASTextGetEncoded(nextWordASText, ASTextGetBestEncoding(nextWordASText, (ASHostEncoding)PDGetHostEncoding()));

Thanks..

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jul 25, 2018 Jul 25, 2018

Copy link to clipboard

Copied

Your problem is your trying to convert from what is most likely Unicode (UTF8 or UTF16) to Host encoding (probably ISO 8891) which may not include those characters. And even if they do, are you then trying to display them in a font that doesn’t include those glyphs.

Instead of using HostEncoding, use UTF8Encoding, as that will give you back an ASCII string which will show you where the extra characters are (via the UTF8 escaping).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 25, 2018 Jul 25, 2018

Copy link to clipboard

Copied

Hi Irosenth,

I have used UTF8 encoding, still it was not working..

the following code i have used,

PDWord nextWord = dwAllWords[Wordindex];

CString TextObjectTxt;

ASText nextWordASText = ASTextNew();

PDWordGetASText(nextWord, 0, nextWordASText);

ASTextFromUnicode((ASUTF16Val*)nextWordASText, kUTF8);

TextObjectTxt = ASTextGetEncoded(nextWordASText, ASTextGetBestEncoding(nextWordASText, kUTF8));

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

DO NOT use ASTextFromUnicode or ASTextGetEncoded. Use ASTextGetUnicodeCopy( nextWordASText, kUTF8)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 27, 2018 Aug 27, 2018

Copy link to clipboard

Copied

LATEST

Still i have not get all characters.

I have used the following code,

PDWord nextWord = dwAllWords[Wordindex];

CString TextObjectTxt;

ASText nextWordASText = ASTextNew();

PDWordGetASText(nextWord, 0, nextWordASText);

ASTextGetUnicodeCopy( nextWordASText, kUTF8)

TextObjectTxt = ASTextGetEncoded(nextWordASText, ASTextGetBestEncoding(nextWordASText, kUTF8));

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 25, 2018 Jul 25, 2018

Copy link to clipboard

Copied

So, what is your host encoding? Does your host encoding include the characters ≤ Ω Β ∞ ≠ ≥? Tip: mine does not. I would have to work in Unicode. I think you are assuming something impossible will work.

What is your aim for these characters: please list all the ways you need them to work? (For example: only in one message that is popped up using the Windows MessageBox function) Please be detailed.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

Please answer my question about intended use.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

I want to extract words from the PDF Page using WordFinder and displayed in the Tree view controller.

While Extracting, all words are getting correctly, Except special characters included in the word.

in the place of special characters, junk character is displayed, like(.,8,O).

I want to know how to display words with special characters.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

Ok, which API are you using to the tree view controller. Some of them are not Unicode aware so this is impossible for them. Others may accept a Unicode string, but you must change your code.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

I extracted all the words through PDWordFinder and Added into List,

then directly insert those Words present in the List into Tree view controller.

I have not used any API to insert words into Tree view controller.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

First I Get Astext object from PDWord,

then,When Getting text from AsText object, there only it shows junk character.

I think, i need to do something while extracting string from ASText Object.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

I do not understand what you mean by no API. Please help us understand in detail what you are doing. I know the tree view controller as something used though the .Net API and also via the MS Access API.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

The following code used for creating Tree view Controller.

cBMCTreeCtrl m_BKMCTreeCtrl = new cBMCTreeCtrl();

UINT nTreeStyles = WS_CHILD | WS_VISIBLE | WS_TABSTOP | WS_BORDER | TVS_LINESATROOT| TVS_HASLINES| TVS_HASBUTTONS| TVS_EDITLABELS | TVS_EX_MULTISELECT;

m_BKMCTreeCtrl->Create(nTreeStyles, crTreeView, this, IDC_BKMCreatorTreeView);

m_BKMCTreeCtrl->ShowWindow(SW_SHOW);

but, When Getting text from AsText object, there only it shows junk character.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

So, that is an API you are using!! CBMCTreeCtrl appears to be a custom C++ API. You must refer to the documentation or code of that API to see if you can pass a Unicode string. You will NOT be able to use the same interface you used to set one byte text. I think you still have not studied Unicode or the concept of encodings. This is vital to you.

However, I do have two suggestions on your next step.

1. Try and add directly the constant string "≤ Ω Β ∞ ≠ ≥" to your tree control, without using Acrobat. You may find this complex or impossible to solve, but once solved you will be ready to work with strings from Acrobat.

2. Also, copy and paste text from this same PDF into Word. Examine the Word document and make sure the characters ≤ Ω Β ∞ ≠ ≥ appear as you wish. If they do not appear in Word, you certainly cannot extract them from the PDF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 26, 2018 Jul 26, 2018

Copy link to clipboard

Copied

In both the cases, Expected result achieved.

I need to work on Acrobat String.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines