I'm building an e-mail archiving system for a company, and
I'm thinking that Verity would be a better way of searching the
archive than what I have now.
Currently, I store the header info - to, from, subject, full
headers, date, message ID in one table in the database. I also have
a table for the message body - the content. When a user searches
for messages in the archive, they can input any of the table 1
fields (to,from, date range, subject contains etc.), and then
optionally some keywords to search the message content for. The
content table is full-text indexed and the db is mysql 5. My one
concern is that the content tables are growing - some companies are
getting 2,500 messages per day (multi-part content is not stored in
the table).
Part of the application also stores the emails themselves as
.eml files in a yyyy/mm/dd file heirarchy. I'm thinking I could use
Verity to index all these .eml files and then use verity searching
for the keywords, and then drop those content tables.
Problem I'm having is there are so many eml files it's
crashing jrun if I try to index them all.
My question is if I write something that indexes them in
chunks of 1,000 files, does the Verity included with CF7 just add
new files to the index, or does it re-index everything - leaving me
with a jrun crashed? I've done some experiments and it seems to me
when I try to "add" files to an index that already exists, it
re-indexes everything.
Thanks,
Steve