Verity, multilanguage, Russian

Report · Mar 19, 2007

Hi,
Here's a problem which has made me scratch my head a little (too much).
We use Verity for simple website searches and have run into a problem with Russian. Some text (webpages in this case) look fine when searching and some look like chinese which would be fine if it wasn't Russian.

The setup is as follows. All pages are utf-8. We've made a small self made crawler which gets the pages, clean them and then add them to the specified collection. This collection uses Multilanguage (uni). All text look fine prior to the CFINDEX. After refreshing the index and run CFSEARCH most result pages are screwed up. The result looks consequently like chinese (no question marks, squares or other weird stuff). Among the result I can see one or two pages that are ok. It's always the same ones. All pages are built dynamically using the same templates.

We've tried to spot the problem by taking text from "bad" pages and systematically removed word by word and guess what. When removing some words, mostly written with capital letters, it looks ok after indexing.

Have anyone any experience from this behavior?

I can add that another site in chinese using the same templates works fine with the same setup as well as english, swedish, danish etc. All utf-8 and unicode. Seems like it is the russian that isn't playing along.

By the way, we run CFMX 7, multiserver config on Linux.

Glad for any info or pointers in the right direction. :)

Report · Mar 20, 2007

> This collection uses Multilanguage (uni)

You do have the RUSSIAN (or RUSSIAN2, not sure what the difference is: it's
all Greek to me) language pack installed, yes? I'm not sure what you meant
about "multilanguage"; I'm no expert - I only started dealing with
Verity/Russian myself in the last few days - but Verity seems to have more
precise requirements than "multilanguage".

--
Adam

Report · Mar 20, 2007

Thanks for the reply. I have the Verity language packs installed including west european, east european and multilanguage.
I've used the multilingual (uni) "language" since this site uses utf-8 and have versions in different languages. What type of setup have you successfully used with Russian?

Report · Mar 20, 2007

> I've used the multilingual (uni) "language" since this site uses utf-8 and
> have versions in different languages. What type of setup have you successfully
> used with Russian?

I use the RUSSIAN language pack, oddly enough. It seems to work OK, other
than the category thing I raised on Sunday.

--
Adam

Adobe Community

Verity, multilanguage, Russian