5 Replies Latest reply on Aug 13, 2006 6:12 AM by mungton

    verity vspider content filter

    mungton
      I have implemented a site crawl using vspider and I am now attempting to refine how the index is peformed. Specificaly I am trying to filter out sections of code within .cfm or .htm pages that I do not want to be returned in keyword searches. For example, I have a header and footer area that is the same for every page in the website, so I do not want content in these areas indexed / searchable. At the moment if a search keyword is used that matches text in the header then every single page is returned in the search results. However I do want the crawler to follow links in these areas as they contain menu and sitemap links.

      I can't find much information about content filtering but it seems like I need to use the style.* files to acheive my goals. Unfortunately there is very little information on how to use these files. Does anybody know how to modify the stylesets to to what im after? Or is there another way to do it?

      Any help or pointers in the right direction would be greatly appreciated. At this stage we are almost considering purchasing a Google search box as the verity search seems to stop short of actualy being useful for large site crawls.

      We are using CFMX7.

      Cheers
        • 1. Re: verity vspider content filter
          mungton Level 1
          If no one knows an answer to my question, could someone point me in the right direction i.e. a verity specific forum or support area?
          • 2. Re: verity vspider content filter
            cornlew Level 1
            Hello,

            I'm having the same problem. Did you ever get this resolved?

            Thanks.
            • 3. verity vspider content filter
              cornlew Level 1
              Hello,

              I'm having the same problem. Did you ever get this resolved?

              Thanks.
              • 4. Re: verity vspider content filter
                llisam
                How about an alternative - put all the searchable content in a separate directory and put all other headers, footers etc in a different folder, use IIS to create a mapping. I've done this to separate include files that I never want to be run independently on the browser (in the unlikely event that a user guesses the filename).
                • 5. Re: verity vspider content filter
                  mungton Level 1
                  cornlew , I never managed to get the problem resolved and could not afford to spend anymore time on it. It was just too hard to find any thorough information about verity filters and no one seemed able to help. From my research I was able to find some information regarding searching and filtering with xml files (using zone filters) but this didn't really apply to what I needed or at least it didn't seem to work in the few tests a ran. If you do look into and find a solution I would like to hear about your findings.

                  llisam, you suggestions is not viable in my case as I need links in the header/footer content crawled as they usually contain menu/navigation links and stripping these out would not allow vspider to crawl my site correctly.