<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:clearspace="http://www.jivesoftware.com/xmlns/jive/rss" version="2.0">
  <channel>
    <title>Adobe Community: Message List - KeywordTokenizerFactory splits the string for the exclamation mark</title>
    <link>https://forums.adobe.com/community/coldfusion/solr?view=discussions</link>
    <description>Most recent forum messages</description>
    <language>en</language>
    <pubDate>Tue, 13 May 2014 18:30:02 GMT</pubDate>
    <generator>Jive Engage 7.0.0.1  (http://jivesoftware.com/products/)</generator>
    <dc:date>2014-05-13T18:30:02Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>KeywordTokenizerFactory splits the string for the exclamation mark</title>
      <link>https://forums.adobe.com/message/6379547?tstart=0#6379547</link>
      <description>&lt;!-- [DocumentBodyStart:a9a322ca-cc2c-4234-9939-139b5f35d84e] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Hi All&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I have a following field settings in solr schema&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;lt;field name="&amp;lt;b&amp;gt;Exact_Word" omitPositions="true" termVectors="false" omitTermFreqAndPositions="true" compressed="true" type="string_ci" multiValued="false" indexed="true" stored="true" required="false" omitNorms="true"/&amp;gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;lt;field name="Word" compressed="true" type="email_text_ptn" multiValued="false" indexed="true" stored="true" required="false" omitNorms="true"/&amp;gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;lt;fieldtype name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true"&amp;gt;&amp;lt;analyzer&amp;gt;&amp;lt;tokenizer class="solr.KeywordTokenizerFactory"/&amp;gt;&amp;lt;filter class="solr.LowerCaseFilterFactory"/&amp;gt;&amp;lt;/analyzer&amp;gt;&amp;lt;/fieldtype&amp;gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;lt;copyField source="Word" dest="Exact_Word"/&amp;gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;As you can see Exact_Word has the KeywordTokenizerFactory and that should treat the string as it is.&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Following is my responseHeader. As you can see I am searching my string only in the filed Exact_Word and expecting it to return the Word field and the score&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;"responseHeader":{&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "status":0,&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "QTime":14,&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; "params":{&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "explainOther":"",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "fl":"Word,score",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "debugQuery":"on",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "indent":"on",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "start":"0",&lt;/p&gt;&lt;p&gt;&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "q":"&lt;/span&gt;&lt;a class="jive-link-email-small" href="mailto:d!sdasdsdwasd!asd@dsadsadas.edu"&gt;d!sdasdsdwasd!asd@dsadsadas.edu&lt;/a&gt;&lt;span&gt;",&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "qf":"Exact_Word",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "wt":"json",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "fq":"",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "version":"2.2",&lt;/p&gt;&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "rows":"10"}},&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;span&gt;But when I enter email with the following string "&lt;/span&gt;&lt;a class="jive-link-email-small" href="mailto:d!sdasdsdwasdasd@dsadsadas.edu"&gt;d!sdasdsdwasdasd@dsadsadas.edu&lt;/a&gt;&lt;span&gt;" it splits the string to two. I was under the impression that KeywordTokenizerFactory will treat the string as it is.&lt;/span&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Following is the query debug result. There you can see it has split the word&lt;/p&gt;&lt;p&gt;&lt;span&gt; "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d)) -DisjunctionMaxQuery((Exact_Word:&lt;/span&gt;&lt;a class="jive-link-email-small" href="mailto:sdasdsdwasdasd@dsadsadas.edu"&gt;sdasdsdwasdasd@dsadsadas.edu&lt;/a&gt;&lt;span&gt;)))~1)",&lt;/span&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;can someone please tell why it produce the query result as this&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;If I put a string without the "!" sign as below, the produced query will be as below&lt;/p&gt;&lt;p&gt;&lt;span&gt; "parsedquery":"+DisjunctionMaxQuery((Exact_Word:&lt;/span&gt;&lt;a class="jive-link-email-small" href="mailto:d_sdasdsdwasd_asd@dsadsadas.edu"&gt;d_sdasdsdwasd_asd@dsadsadas.edu&lt;/a&gt;&lt;span&gt;))",. This is what I expected solr to even with the "!" mark. with "_" mark it wont do a string split and treats the string as it is&lt;/span&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I thought if the KeywordTokenizerFactory is applied then it should return the exact string as it is&lt;/p&gt;&lt;p style="min-height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Please help me to understand what is going wrong here&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:a9a322ca-cc2c-4234-9939-139b5f35d84e] --&gt;&lt;img src='/beacon?t=1415921349043' /&gt;</description>
      <pubDate>Tue, 13 May 2014 18:27:58 GMT</pubDate>
      <author>forums_noreply@adobe.com</author>
      <guid>https://forums.adobe.com/message/6379547?tstart=0#6379547</guid>
      <dc:date>2014-05-13T18:27:58Z</dc:date>
      <clearspace:dateToText>6 months 2 days ago</clearspace:dateToText>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
  </channel>
</rss>

