1 Reply Latest reply on Aug 13, 2011 6:21 AM by Adam Cameron.

    narrowing search collection in html

    rgurganus

      I'm looking at setting up a website search in Solr (or Verity), and want to point it to a collection of html files.  Is there a way to setup a collection to include only content within specific div(s)?  I don't want it to include the header, sidebars, footer, etc.  Thanks.

        • 1. Re: narrowing search collection in html
          Adam Cameron. Level 5

          With the tools that CF provides, I think you're gonna have to extract this info yourself, stick it in a query, and do a CUSTOM index job.

           

          You can probably do it directly with Lucene, but "how" is probably a question best asked on a Lucene forum (having first read through the docs ;-).

           

          That said, all the rest of the bumpf on the page does add context to the document, so you might want to question whether it's necessary or even desirable to omit it from the indexing process.

           

          --

          Adam

          1 person found this helpful