CQ 5.4. I'm trying to get accented characters to be indexed with the same weight as their non-accented equivalents. So a search for "biere" would return the same results as a search for "bière".
From what I've read on Lucene and Jackrabbit documentation, the way to do this would be to specify in the workspace.xml config <SearchIndex>...</SearchIndex> the param:
<param name="analyzer" value="org.apache.lucene.analysis.ASCIIFoldingFilter"/>
or any similar classes specified here: http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/package-summary.htm l.
However it seems like none of these classes are in scope in CQ:
06.11.2012 16:43:15 *WARN * SearchIndex: Invalid Analyzer class: org.apache.lucene.analysis.ASCIIFoldingFilter (SearchIndex.java, line 1698)
I'm wondering how to load analyzer classes. Should I build the default Lucene package as an OSGi bundle? Or is there another way?
Please note that CQ 5.4 is bundled with Lucene version 2.4.1 which does not have the ASCIIFoldingFilter yet (check the list of available analyzer filters: find <CQ_HOME> -name "lucene-core*.jar" | xargs jar tf | grep "analysis/.*Filter").
As an alternative, you can use the ISOLatin1AccentFilter (org.apache.lucene.analysis.ISOLatin1AccentFilter) which should satisfy the requirement of your use case (i.e. replacing accented characters by their unaccented equivalent).
On a side note: As CQ 5.5 is bundled with Lucene version 3.0.3, you will be able to use ASCIIFoldingFilter which deprecates ISOLatin1AccentFilter in newer versions of Lucene.