Windows – How to one learn to read the Windows Server event viewer and know what events are normal and which are signs of potential problems

eventviewerwindowswindows-event-log

I have been managing Windows Server 2003 machines at work, but I am a software developer. (Please don't say 'hire a sysadmin', the point of this question is my own learning).

How do server admins learn what to look for in event viewer? Sometimes there will be strange things that I don't understand, many times there will be things that are always there that I simply ignore because they are always there.

Is there some resource somewhere that can train me on what is normal behavior for a Windows Server event viewer log and what things may spell disaster?

Or maybe there is some third party tool that will decipher them and make recommendations? I would prefer the learning route though.

Best Answer

The event logs are a clearing-house for any messages or errors thrown by the OS, its components, and any software installed on the system. So we can't fully cover all it's potential contents because there's unlimited potential things it could contain and they all require individual treatment.

One way to analyze event logs is:

  1. Filter out the informational alerts so that you're just seeing warnings and errors.
  2. Research each one in turn and attempt to resolve each one as you go. Google is a perfectly legitimate way of accomplishing this. If you're able to resolve an error so that it doesn't re-occur, great, case closed for that one. On to the next.
  3. If you can't resolve an error, try to determine whether it's benign or a genuine problem. If it's a genuine problem, escalate it. If it isn't, add it to your 'known error' records (or mental 'ignore this' pool) and move on to the next error.

That's about all there is to it. Security Event Log auditing is a bit different but Application and System can usually be covered pretty well with the approach above.

You can set up monitoring/alerting packages to watch event logs and alert you. There's 2 typical approaches to this:

  1. Configure the tool to watch for specific entries and alert on them
  2. Configure the tool to ignore known benign entries and alert on everything else

Each approach has its strengths. One key thing to remember though is that a monitoring tool is only as useful as its configured to be, and there's no 'magic bullet' for this that'll give you a good blend of 'quiet enough' and 'guaranteed to alert you every time there's a genuine problem'. Unfortunately that requires continuous balancing.

Related Topic