Iis – Merging Large IIS Logs

iislog-fileslogparserwebalizer

I am planning on using Webalizer to analyze and graph our IIS logs, but because we have a server farm Webalizer requires me to make sure that all of the logs are in chronological order (or else it will start skipping results).

Our logs are stored gzipped so I started by unzipping everything to separate files and then I used LogParser 2.2 in order to merge those files. My LogParser command was:

LogParser.exe -i:iisw3c "select * into combinedLogFile.log from *.log order by date, time" -o:w3c 

I probably don't need * but I do need most of the fields because Webalizer will need them. This works perfectly fine on some of my logs, however one of our server farm clusters generate a LOT of logs, we have 14 servers where each server's logs are (at least) 2.5 GB per day (each log is in a separate day). So when I try and merge these logs LogParser just crashes with a meaningless generic error.

I assumed it was a memory issue and so I tried a number of ways to try and minimize the memory.

I am using powershell to call LogParser and so I started to try and pipe the input using the standard Powershell piping. (This caused an OutOfMemoryException in Powershell (instead of LogParser) sooner than just using the files in any way I could do it).

What I finally ended up with is using multiple named pipes being called from a batch file call to "Cat" directly piping that into LogParser…and I got back to where I started when I was pre-zipping them.

We have other scripts that process these same log files and none of them have issues (although their output is generally smaller than this ones will be).

So I just want to know if you have any ideas about a better way to merge all of these files or some LogParser script that will work as the one I came up isn't sufficient.

P.S. I know I could probably write a merging program in .NET as all of the individual logs are already sorted and so I wouldn't need to read more than a few rows at a time but I am trying to avoid having to do that if possible.

Best Answer

Given that you are running into issues trying to sort the data for a single day, I'd look to one of two strategies.

  1. Find a better sort. See if you can get the windows sort tool to work for you. The logs are rigged with date and time first, in an ascii-text-sort friendly format for a reason. It uses a lot less memory and doesn't have to parse lines to sort. My bet is this works for you.

  2. Write an interleave, that opens all 14 files and pulls the earliest line from the top of each, working its way through the 14 files simultaneously. I shudder to think of this but it wouldn't need but 64KB of memory for each file.

old answer:

Divide and conquer. Write one script that reads logs and puts them in new files by date, with a known filename that has the date in it (weblog-20110101.log). Run a sort on each file that sorts by time. Cat the files you need together.