0 Replies Latest reply: May 13, 2014 11:30 AM by nativecoder RSS

    KeywordTokenizerFactory splits the string for the exclamation mark

    nativecoder Community Member

      Hi All

       

      I have a following field settings in solr schema

       

      <field name="<b>Exact_Word" omitPositions="true" termVectors="false" omitTermFreqAndPositions="true" compressed="true" type="string_ci" multiValued="false" indexed="true" stored="true" required="false" omitNorms="true"/>

       

      <field name="Word" compressed="true" type="email_text_ptn" multiValued="false" indexed="true" stored="true" required="false" omitNorms="true"/>

       

      <fieldtype name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true"><analyzer><tokenizer class="solr.KeywordTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/></analyzer></fieldtype>

       

      <copyField source="Word" dest="Exact_Word"/>

       

      As you can see Exact_Word has the KeywordTokenizerFactory and that should treat the string as it is.

       

      Following is my responseHeader. As you can see I am searching my string only in the filed Exact_Word and expecting it to return the Word field and the score

       

      "responseHeader":{

          "status":0,

          "QTime":14,

          "params":{

            "explainOther":"",

            "fl":"Word,score",

            "debugQuery":"on",

            "indent":"on",

            "start":"0",

            "q":"d!sdasdsdwasd!asd@dsadsadas.edu",

            "qf":"Exact_Word",

            "wt":"json",

            "fq":"",

            "version":"2.2",

            "rows":"10"}},

       

       

      But when I enter email with the following string "d!sdasdsdwasdasd@dsadsadas.edu" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is.

       

      Following is the query debug result. There you can see it has split the word

      "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d)) -DisjunctionMaxQuery((Exact_Word:sdasdsdwasdasd@dsadsadas.edu)))~1)",

       

      can someone please tell why it produce the query result as this

       

      If I put a string without the "!" sign as below, the produced query will be as below

      "parsedquery":"+DisjunctionMaxQuery((Exact_Word:d_sdasdsdwasd_asd@dsadsadas.edu))",. This is what I expected solr to even with the "!" mark. with "_" mark it wont do a string split and treats the string as it is

       

      I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is

       

      Please help me to understand what is going wrong here