Python Logging – Processing Postfix Logs with Python

I need to process all log messages from Postfix (/var/log/mail/mail.log), and print a summary/statistics (how many emails were sent/received and from/to which email addresses)

The situation is made more complicated by the fact, that Postfix has multi-line log entries (in contrast, Apache for example, has single line entries and the task would have been much easier).

A sample Postfix log might look something like this:

2013-12-03 14:40:45  postfix:  6F1AA10B: client=unknown[64.12.143.81]
2013-12-03 14:40:45  postfix:  6F1AA10B: message-id=<529DDF56.6050403@aol.com>
2013-12-03 14:40:45  postfix:  6F1AA10B: from=<martin.vegter@aol.com>, size=1571, nrcpt=1 (queue active)
2013-12-03 14:40:45  postfix:  6F1AA10B: to=<martin@example.com>, relay=local, delay=0.13, delays=0.13
2013-12-03 14:40:45  postfix:  6F1AA10B: removed

2013-12-03 14:52:07  postfix:  9DD9610B: client=unknown[209.85.219.65]
2013-12-03 14:52:07  postfix:  9DD9610B: message-id=<CANE3EAQUsGwj6ZBAU-awymzsG=76XZnHih@mail.gmail.com>
2013-12-03 14:52:07  postfix:  9DD9610B: from=<martin.vegter@gmail.com>, size=2388, nrcpt=1 (queue active)
2013-12-03 14:52:07  postfix:  9DD9610B: to=<martin@example.com>, orig_to=<martin@example.com>, relay=local
2013-12-03 14:52:07  postfix:  9DD9610B: removed

Every email message that was processed by Postfix has a unique message ID (in my example 6F1AA10B).

What would be the best approach to process the logs in Python? What data structure would you recommend to use for storing the entries?

from itertools import groupby from operator import itemgetter for msgid, messages in groupby(parsed_lines, key=itemgetter('msgid')): for message in messages: # Each `message` is a dictionary where the `msgid` is the same

Best Answer

How you store your items depends on how you are processing them further; in-memory aggregating is very different from storing individual items in rows in a SQL database, for example.

Parsing could be done by grouping records on a specific element in the line. Presumably an event for a given message ID can span multiple timestamps, but you can parse out lines into a dictionary, then use itertools.groupby() to group parsed lines.

I'll not go into the line parsing itself, but if we assume that a dictionary is produced to with a msgid key you can do:

Best Answer

Related Solutions

Python – Maintaining log stream after file name change

Related Topic