How do *you* track and document routine maintenance

maintenancemonitoring

What software or system do you guys out on server fault use to remind you to do routine maintenance? How do you checklist and log the various items you are supposed to check? Do you have an internal process document? Do you have cron mail you every week with reminders to check system logs?

Also, do you work on a team to do system maintenance, and if so, how do you coordinate who will do what maintenance?

If you use a bug/issue tracking system to enter tasks, do you have a cron job enter recurring tasks?

Best Answer

I'm currently using Request Tracker (http://www.bestpractical.com/rt)
All maintenance events get an associated ticket in the "systems" queue. Notes on problems encountered, who did what work when, etc. are all entered into the ticket, along with necessary approvals.

At the moment our recurring tasks (quarterly patching, etc.) are manually created, but they could be automated easily enough (cron job + email).

Coordinating who is doing what work is relatively easy for us as there's only 2 people in our admin group, but as we scale up the plan is to create a master ticket for maintenance events & use child tickets assigned to the responsible parties to delegate the work.


Daily stuff (log checks, etc.) is another matter: I have all of that farmed out to automated processes:

  • InterMapper keeps an eye on the servers' overall status (SNMP queries looking for high load, low disk space, etc.), functionality of our web interfaces, and sundry other things that could indicate trouble.
  • Syslog-NG collects logs from our hosts & feeds them through a bunch of scripts that check for obvious badness. I cast my eye over the logs occasionally to sanity-check the scripts, but it's not regularly scheduled.