I created a document management section for our departments site (CF9). I'm asking for help creating a new part of the search, and will list all of the parts I am looking for help with, that way if you can only contribute to one part, it will get me that much closer to what I need. I already have a Title only search working, and added a second search that does both Title and Keywords. I am using cfindex and simply created a new collection for the Keywords and searched the two collections. Our staff will mostly search only Titles but wanted the option to search all content.
The next colletion will be for the content of the documents. The site will contain .doc and .docx files. All files are maintained in the same folder. There is a MS SQL database table that indicates if the file is in an archive status or active. The same table lists the document number and its extension.
So, what I want to do is query the database to see what files I want the index to scan. I need to dump the collection each night just after midnight and re-do the index. There will be documents added and archived on a regular basis and I don't want archived documents remaining in the collection.
Almost every document will contain common words such as the, and... so if a user searched for "the correct temperature" I don't want to retrieve all of the documents. How can I exclude common words?
If you build a Solr collection, I believe the common words are already by default not included in searches.
I deleted the Verity and created the Solr collections. By the way, how do you pronounce that, Solar?
I still need help indexing the documents.
As I mentioned, there is a single folder of .doc or .docx files, but I only want to index the active ones. To determine the active ones I will do a simple db query first.
Is this possible? If not I will need to redo the way files are archived and move them to a different folder.