2 Replies Latest reply on Mar 19, 2014 7:21 AM by Duncan Goss

    Solr search returns words like the search word, but not the search word itself

    Duncan Goss

      We recently upgraded from ColdFusion 8 to ColdFusion 10.  Everything works fine except for the text search function. 

       

      We manually re-created a smaller collection of HTML documents, and the index did appear to run.  Most searches seem to work fine, but the longer word "severability" returns "several", "severally", but no "severability".

       

      We also attempted to index a larger collection, which includes PDF documents.  At first, we forgot to include PDF among the document types in the collection.  The index ran, but returned zero hits, although there are indeed HTML documents in the target population.  When we remembers to add in the PDF documents, the indexing process consistently returns "out of memory" errors.

       

      Any help would be appreciated.

        • 1. Re: Solr search returns words like the search word, but not the search word itself
          vishu#13 Level 3

          What is the field type, text or string? By default solr does a fuzzy search on text field. You need to set up your field as a string field and add no tokenizer then you'll get an exact match.

           

          Schema.xml

           

          <field name="name"             type="string" indexed="true" stored="false" required="true" />
          <field name="nameString"       type="string" indexed="true" stored="false" required="true" />
          <copyField source="name" dest="nameString"/>

           

          Solrconfig.xml

           

          <requestHandler name="accounts" class="solr.SearchHandler">
              <lst name="defaults">
                <str name="defType">dismax</str>
                <str name="qf">
                  nameString^10.0 name^5.0 description^1.0
                </str>
                <str name="tie">0.1</str>
              </lst>
            </requestHandler>

           

          If in your collection, you have 10 pdf and 10 .html then what is the result? The number of documents is 10 or 20?

           

          HTH

           

          Thanks

          VJ

          • 2. Re: Solr search returns words like the search word, but not the search word itself
            Duncan Goss Level 1

            VJ --

             

            I have to confess that your response assumes knowledge on my part that I do not have.  I have located the Schema.xml and SolrConfig.xml files for the collection in question, but that is about it.

             

            In the Schema.xml file I see “<field name=” entries for what appear to be the fields returned by the queries, but do not know what fields I should be setting to string.  I am attempting to do a full-text search of the contents of the html documents, not on metadata.

             

            In the SolrConfig.xml file I see a number of “<requestHandler” entries, but do not know if I am supposed to modify one of them to match what you sent (in which case I need to know which one), or add what you sent in its entirety.

             

            Also, In an unrelated matter, I am attempting to do a query of queries search on the result of my cfsearch, but any attempt to create a “where Key=’xxx’ ” or “where Key LIKE “%xxx%’ “ clause fails.  I CAN use a “where Rank=’1’ clause, but that isn’t what I am trying to do.  Is that because the Key field is text where it needs to be string?  (Basically, what I am attempting to do is to restrict the search to only certain subdirectories in the collection.  In ColdFusion 8, with the Verity search, this was fairly easy to do by adding a Key field clause to the criteria string.)

             

            Finally, the collection with the PDF documents:  It contains 31000 .HTM documents and 10861 .PDF documents, and reports 41850 documents in the collection.  However, the indexing fails, as do queries against this collection.  Both are seemingly due to memory size issues.  In ColdFusion 8, the Verity search indexing handled this size collection easily.

             

            Duncan