1 Reply Latest reply on Dec 5, 2007 5:58 PM by

    Verity indexing question

      I'm building an e-mail archiving system for a company, and I'm thinking that Verity would be a better way of searching the archive than what I have now.

      Currently, I store the header info - to, from, subject, full headers, date, message ID in one table in the database. I also have a table for the message body - the content. When a user searches for messages in the archive, they can input any of the table 1 fields (to,from, date range, subject contains etc.), and then optionally some keywords to search the message content for. The content table is full-text indexed and the db is mysql 5. My one concern is that the content tables are growing - some companies are getting 2,500 messages per day (multi-part content is not stored in the table).

      Part of the application also stores the emails themselves as .eml files in a yyyy/mm/dd file heirarchy. I'm thinking I could use Verity to index all these .eml files and then use verity searching for the keywords, and then drop those content tables.

      Problem I'm having is there are so many eml files it's crashing jrun if I try to index them all.

      My question is if I write something that indexes them in chunks of 1,000 files, does the Verity included with CF7 just add new files to the index, or does it re-index everything - leaving me with a jrun crashed? I've done some experiments and it seems to me when I try to "add" files to an index that already exists, it re-indexes everything.