1 Reply Latest reply on Nov 10, 2012 7:44 PM by Gregor Zurowski

    Lucene Analyzer classes for CQ search

    mtremblay33 Level 1

      CQ 5.4. I'm trying to get accented characters to be indexed with the same weight as their non-accented equivalents. So a search for "biere" would return the same results as a search for "bière".


      From what I've read on Lucene and Jackrabbit documentation, the way to do this would be to specify in the workspace.xml config <SearchIndex>...</SearchIndex> the param:


      <param name="analyzer" value="org.apache.lucene.analysis.ASCIIFoldingFilter"/>


      or any similar classes specified here: http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/package-summary.htm l.


      However it seems like none of these classes are in scope in CQ:

      06.11.2012 16:43:15 *WARN * SearchIndex: Invalid Analyzer class: org.apache.lucene.analysis.ASCIIFoldingFilter (SearchIndex.java, line 1698)

      java.lang.ClassNotFoundException: org.apache.lucene.analysis.ASCIIFoldingFilter


      I'm wondering how to load analyzer classes. Should I build the default Lucene package as an OSGi bundle? Or is there another way?

        • 1. Re: Lucene Analyzer classes for CQ search
          Gregor Zurowski Level 1



          Please note that CQ 5.4 is bundled with Lucene version 2.4.1 which does not have the ASCIIFoldingFilter yet (check the list of available analyzer filters: find <CQ_HOME> -name "lucene-core*.jar" | xargs jar tf | grep "analysis/.*Filter").


          As an alternative, you can use the ISOLatin1AccentFilter (org.apache.lucene.analysis.ISOLatin1AccentFilter) which should satisfy the requirement of your use case (i.e. replacing accented characters by their unaccented equivalent).


          On a side note: As CQ 5.5 is bundled with Lucene version 3.0.3, you will be able to use ASCIIFoldingFilter which deprecates ISOLatin1AccentFilter in newer versions of Lucene.