• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Verity: Russian stemming

Explorer ,
Mar 26, 2007 Mar 26, 2007

Copy link to clipboard

Copied

Hi
I'm developing a site in Russian. One of the requirements is for the search to use the stemming functionality Verity CLAIMS to have.

Now, I don't speak a word of Russian, but the client has provided me with some test words which I have pumped through a test rig, with very poor results. The test words are:

singular plural
Nom. беженец беженцы
Gen. беженца беженцев
Dat. беженцу беженцам
Acc. беженца беженцев
Instr. беженцем беженцами
Narr. беженце беженцах

(which is basically declension of the word "refugee" in Russian, I'm told).

My test rig (I'll attach the code) creates a record for each case / plurality combination, and indexes that. Then I search for each of the variations. The search only matches EXACT matches: the case-endings for accusative and genititve are the same, in this case. It does not seem to consider any notion of stemming, at all.

Any ideas?

--
Adam
TOPICS
Advanced techniques

Views

416

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Explorer , May 09, 2007 May 09, 2007
A footnote to this.

Ken & the guys from Adobe came back to me after my last comment, and did a bit of a turn around. Ken's helped me through this with sample code and some pointers in the right direction to assist me to utilise LUCENE's Russian-stemming capabilities (which I can confirm do actually work!), as a "pre-processor" which tokenises documents before Verity indexes them, and then does the same thing to search strings before passing them to Verity. This works OK, and is a workable band-...

Votes

Translate

Translate
LEGEND ,
Mar 26, 2007 Mar 26, 2007

Copy link to clipboard

Copied

For the newsfeed readers amongst us, you're gonna have to look at thew web
version of this issue to see the Cyrillic content of my posting.

http://www.adobe.com/cfusion/webforums/forum/messageview.cfm?forumid=1&catid=7&threadid=1254426&ente...

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Mar 30, 2007 Mar 30, 2007

Copy link to clipboard

Copied

Hi Adam,
I will take a look at this. If you are not working with the following hotfix/security bulletin install, please retest with it ( bulletin.) It will bring your Verity install to 5.5 sp2 with additional patches. Lets see if it helps with your issue.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 30, 2007 Mar 30, 2007

Copy link to clipboard

Copied

Sorry Ken, it makes no difference.

NB: we've opened a support call for this, the "EET" is 1681. I've been dealing with Nick Watson. It might be worthwhile comparing notes with him before diving into this, as it would be a shame for you to waste your time covering already-trod ground.

If I hear anything back from Nick, I will be sure to also post here.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 04, 2007 Apr 04, 2007

Copy link to clipboard

Copied

Just an update / closure on this.

Adobe have confirmed it's a bug, and have blamed Autonomy (nee Verity), by way of washing their hands of the situation.

The wording of the feedback suggests (this is NOT a quote) that that's all they have to say on the matter, rather than any offer of assistance of a work around; or even any sense of "hey, sorry you've wasted all that time and money on this, we understand it's our fault you're in this situation, is there anything we can do to help you out?". Nothing like that.

I don't think this is the end of my pursuit of the matter with them, to be honest, as that kind of response seems unacceptable to me. To put it mildly.

I figure this is worth highlighting here so other people don't fall into the same trap I have.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
May 09, 2007 May 09, 2007

Copy link to clipboard

Copied

LATEST
A footnote to this.

Ken & the guys from Adobe came back to me after my last comment, and did a bit of a turn around. Ken's helped me through this with sample code and some pointers in the right direction to assist me to utilise LUCENE's Russian-stemming capabilities (which I can confirm do actually work!), as a "pre-processor" which tokenises documents before Verity indexes them, and then does the same thing to search strings before passing them to Verity. This works OK, and is a workable band-aid to the situation.

My next move is to factor Verity out of the equation (and, I hasten to add, any FUTURE equation involving searching - Russian or otherwise), and produce a pure Lucene solution.

Thanks Ken and Skip for helping us out on this issue. It's restored my faith in Adobe.

Cheers.

--
Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation