Hi,
Here's a problem which has made me scratch my head a little
(too much).
We use Verity for simple website searches and have run into a
problem with Russian. Some text (webpages in this case) look fine
when searching and some look like chinese which would be fine if it
wasn't Russian.
The setup is as follows. All pages are utf-8. We've made a
small self made crawler which gets the pages, clean them and then
add them to the specified collection. This collection uses
Multilanguage (uni). All text look fine prior to the CFINDEX. After
refreshing the index and run CFSEARCH most result pages are screwed
up. The result looks consequently like chinese (no question marks,
squares or other weird stuff). Among the result I can see one or
two pages that are ok. It's always the same ones. All pages are
built dynamically using the same templates.
We've tried to spot the problem by taking text from "bad"
pages and systematically removed word by word and guess what. When
removing some words, mostly written with capital letters, it looks
ok after indexing.
Have anyone any experience from this behavior?
I can add that another site in chinese using the same
templates works fine with the same setup as well as english,
swedish, danish etc. All utf-8 and unicode. Seems like it is the
russian that isn't playing along.
By the way, we run CFMX 7, multiserver config on Linux.
Glad for any info or pointers in the right direction.
:)