• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Parsing & Analyzing Log Files

New Here ,
Oct 02, 2009 Oct 02, 2009

Copy link to clipboard

Copied

Afternoon,

Once again thanks ahead of time for looking over my post.

Here is what I am working on this time. I am working on an email monitor, which is run via a scheduled task on the hour, every hour. This is in coldfusion of course.

What I need to do is grab the sent e-mails. The only record of email status is in a daily log file within the email server. The log file can be anywhere between 20kb to 120mb. The format of the log file itself is a bit random based on which step of the email process it is on.

The file name is saved as sysMMDD.txt and we do have a process running every 20mins to check the log file size for the current date. If it's larger than 10mb we rename it sysMMDD_1.txt This is really irrelevant to my question but thought I'd provide all the information.

Going back to the actual format of the log format it looks something like this:

HH:MM SS:MS TYPE(HASH) [IP ADDRESS] etc.

Type = type of e-mail or service being called

hash = a unique hash id of the e-mail to tie the steps together

etc. is all of the text after [IP ADDRESS] this has no basic structure based on which step.

The monitor needs to grab all the sends in this log file between an hour span. Remember, the log could contain up to a days worth of data. As it stands right now I'm able to do a send count by searching for the # of times 'ldeliver' appears in the log.

Does anyone have any suggestions for parsing a log file like this one? I'm worried that the way I'm doing it now, which is a hack, will not suffice and there is probably a better way to do it.

Basically right now I'm doing a cfloop using the index="line" to go through the file. You can imagine how this performs with large log files which is why we created the scheduled task above to rename log files. Now if I start adding time extractions as well, I'm pretty sure this process is going to bust.

I know this post is scattered but it's just one of those days where everything appears to be happening at once. Does anyone have any other ideas about going about this process? Someone mentioned an ODBC datasource to the text file but will that work when it's space delimited and only the first "four" chunks are reliable format?

Any help appreciated!

TOPICS
Advanced techniques

Views

794

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

LEGEND , Oct 03, 2009 Oct 03, 2009

Sorry, yeah.  I didn't see you mention that another app generates the log.

Looping over the file line by line does not really add too much of a resource overhead.  It does not need to load the whole file into RAM, it only reads each line in turn.  I tried looping over a 1GB file on a CF instance with only 512MB of RAM allocated to it, and it churned away quite nicely, processing a few lines per millisecond, and it never broke a sweat.  It took about 7min to process a million rows, and never consu

...

Votes

Translate

Translate
LEGEND ,
Oct 02, 2009 Oct 02, 2009

Copy link to clipboard

Copied

It sounds to me like you have two different requirements for this data being logged: one for the full log, another for the SEND data.  Why don't you just additionally write the latter data to a separate log at the time you're logging it in the first place?

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 02, 2009 Oct 02, 2009

Copy link to clipboard

Copied

I don't have control over how the logs are generated. They are the result of e-mail server software known as iMail. If I had control I'd be logging to a sql db...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 03, 2009 Oct 03, 2009

Copy link to clipboard

Copied

Sorry, yeah.  I didn't see you mention that another app generates the log.

Looping over the file line by line does not really add too much of a resource overhead.  It does not need to load the whole file into RAM, it only reads each line in turn.  I tried looping over a 1GB file on a CF instance with only 512MB of RAM allocated to it, and it churned away quite nicely, processing a few lines per millisecond, and it never broke a sweat.  It took about 7min to process a million rows, and never consumed any more than a marginal amount of memory.

Do actually know doing it this way will cause you gyp?  It doesn't sound like the sort of process that really needs to be lightning quick: it's a background process, isn't it?

I guess if you were concerned about it, you could pump the file through grep at file-system level first to extract just the lines you want, and process that much smaller file.  The file system should process files of that size pretty quickly and efficiently.

I would not bother trying to put this stuff into a DB and then process it: doing that would probably be more work than just looping over the file as you are now.

--

Adam

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 05, 2009 Oct 05, 2009

Copy link to clipboard

Copied

LATEST

Correct, it is a background process. The whole system is general is causing me gyp...oh well. Thanks for your thoughts, I'll continue on the path I'm on now.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation