0 Replies Latest reply on Jun 22, 2007 1:18 PM by Laurence Middleton

    Verity HTML doc index: extract meta keywords?

    Laurence Middleton
      I have Verity collections for directories of HTML documents. I would like to be able to get the keyword META tag content as a field in the results of a CFSEARCH, the same way that Verity automatically supplies a TITLE field using the TITLE tag in each HTML document.

      It seems like the custom fields (custom1 ... custom4) would be ideal for this, but I find no documentation nor examples for doing this in CFINDEX type="path" -- only for indexes on databases (i.e. type="custom").

      Can Verity do what I describe?

      I've tried obvious-looking (but wrong!) approaches such as:
      CFINDEX type="path" custom1="<MATCH>Keywords" ...

      ...and many variations quoting things with single-quotes and back-ticks (`), pound signs, etc. Either a CF error is thrown when I try it, or the custom1 field in search results contains the literal text "<MATCH>Keywords" (or whatever) rather than evaluating it.

      NOTE: I'm NOT trying to dynamically search within the META tag content - that turns out to be easy. You just CFSEARCH with criteria something like criteria="Keywords <CONTAINS> #mySearchText#". No, what I'd like is to show the entire keyword META tag content as a field in the search results.

      BTW, it seems like both CONTEXT and SUMMARY are taken only from the body of the HTML. I haven't yet been able to get any META tag text into search result fields. (The special case of TITLE is the only non-BODY content that shows up.)

      I'm considering workarounds such as spidering the appropriate collection directories, or maybe using CFDIRECTORY / CFFILE each time a document is added, changed or removed to extract text into a searchable database.