• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Verity indexing question

New Here ,
Oct 31, 2007 Oct 31, 2007

Copy link to clipboard

Copied

I'm building an e-mail archiving system for a company, and I'm thinking that Verity would be a better way of searching the archive than what I have now.

Currently, I store the header info - to, from, subject, full headers, date, message ID in one table in the database. I also have a table for the message body - the content. When a user searches for messages in the archive, they can input any of the table 1 fields (to,from, date range, subject contains etc.), and then optionally some keywords to search the message content for. The content table is full-text indexed and the db is mysql 5. My one concern is that the content tables are growing - some companies are getting 2,500 messages per day (multi-part content is not stored in the table).

Part of the application also stores the emails themselves as .eml files in a yyyy/mm/dd file heirarchy. I'm thinking I could use Verity to index all these .eml files and then use verity searching for the keywords, and then drop those content tables.

Problem I'm having is there are so many eml files it's crashing jrun if I try to index them all.

My question is if I write something that indexes them in chunks of 1,000 files, does the Verity included with CF7 just add new files to the index, or does it re-index everything - leaving me with a jrun crashed? I've done some experiments and it seems to me when I try to "add" files to an index that already exists, it re-indexes everything.

Thanks,
Steve
TOPICS
Advanced techniques

Views

226

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Dec 05, 2007 Dec 05, 2007

Copy link to clipboard

Copied

LATEST
I may have a viable alternative, conceptual/semantic search against email archieves for you. Feel free to try the following demo,
http://web.mytata.net:8000/TextSearch/semantic%20search.cfm

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation