5 Replies Latest reply on May 9, 2007 12:50 PM by Adam Cameron

    Verity: Russian stemming

      I'm developing a site in Russian. One of the requirements is for the search to use the stemming functionality Verity CLAIMS to have.

      Now, I don't speak a word of Russian, but the client has provided me with some test words which I have pumped through a test rig, with very poor results. The test words are:

      singular plural
      Nom. беженец беженцы
      Gen. беженца беженцев
      Dat. беженцу беженцам
      Acc. беженца беженцев
      Instr. беженцем беженцами
      Narr. беженце беженцах

      (which is basically declension of the word "refugee" in Russian, I'm told).

      My test rig (I'll attach the code) creates a record for each case / plurality combination, and indexes that. Then I search for each of the variations. The search only matches EXACT matches: the case-endings for accusative and genititve are the same, in this case. It does not seem to consider any notion of stemming, at all.

      Any ideas?

        • 1. Re: Verity: Russian stemming
          Level 7
          For the newsfeed readers amongst us, you're gonna have to look at thew web
          version of this issue to see the Cyrillic content of my posting.

          http://www.adobe.com/cfusion/webforums/forum/messageview.cfm?forumid=1&catid=7&threadid=12 54426&enterthread=y

          • 2. Re: Verity: Russian stemming
            ksmith Level 1
            Hi Adam,
            I will take a look at this. If you are not working with the following hotfix/security bulletin install, please retest with it ( bulletin.) It will bring your Verity install to 5.5 sp2 with additional patches. Lets see if it helps with your issue.
            • 3. Re: Verity: Russian stemming
              Level 1
              Sorry Ken, it makes no difference.

              NB: we've opened a support call for this, the "EET" is 1681. I've been dealing with Nick Watson. It might be worthwhile comparing notes with him before diving into this, as it would be a shame for you to waste your time covering already-trod ground.

              If I hear anything back from Nick, I will be sure to also post here.

              • 4. Re: Verity: Russian stemming
                Level 1
                Just an update / closure on this.

                Adobe have confirmed it's a bug, and have blamed Autonomy (nee Verity), by way of washing their hands of the situation.

                The wording of the feedback suggests (this is NOT a quote) that that's all they have to say on the matter, rather than any offer of assistance of a work around; or even any sense of "hey, sorry you've wasted all that time and money on this, we understand it's our fault you're in this situation, is there anything we can do to help you out?". Nothing like that.

                I don't think this is the end of my pursuit of the matter with them, to be honest, as that kind of response seems unacceptable to me. To put it mildly.

                I figure this is worth highlighting here so other people don't fall into the same trap I have.

                • 5. Re: Verity: Russian stemming
                  Level 1
                  A footnote to this.

                  Ken & the guys from Adobe came back to me after my last comment, and did a bit of a turn around. Ken's helped me through this with sample code and some pointers in the right direction to assist me to utilise LUCENE's Russian-stemming capabilities (which I can confirm do actually work!), as a "pre-processor" which tokenises documents before Verity indexes them, and then does the same thing to search strings before passing them to Verity. This works OK, and is a workable band-aid to the situation.

                  My next move is to factor Verity out of the equation (and, I hasten to add, any FUTURE equation involving searching - Russian or otherwise), and produce a pure Lucene solution.

                  Thanks Ken and Skip for helping us out on this issue. It's restored my faith in Adobe.