We use WhatsUp Gold to monitor all of our web servers. On our Linux servers (and much to the same degree, our FreeBSD servers) I'm having a little bit of an issue with the memory monitors. We're using SNMP with WUG to grab the data from the servers. The memory counter that the SNMP daemon returns on the servers is the combined value (used, cached, buffers). Right now one of my servers looks like this:
[admin@stgwww snmp]$ free -m
total used free shared buffers cached
Mem: 7872 1656 6216 0 143 1107
-/+ buffers/cache: 404 7467
Swap: 4867 0 4867
The Value being returned via SNMP to WUG is 1656. From what I understand, the cached RAM is essentially FREE RAM with the added benefit of hanging on to data that previously occupied it in case it's needed again. So for our purposes of wanting to know how much RAM is actually being actively used, the value we're getting back is misleading. If we go off of what's being graphed by WUG, we're being led to believe that more RAM is being used and less is available than there really is.
So whats that best way to go about monitoring this? WUG allows me to write SSH scripts, which can SSH into the server every 5 minutes or so, execute a script and return the value (as long as it's a single numeric value). With this I've written a script that pulls the "404" number from the example above and divides it by the total amount giving me a percent used value which I return to WUG and graph on a chart that scales from 0 to 100. But this seems like way to much of a hack.
Am I better off monitoring the free+buffers+cached value? Is there a better way to do this in WUG? Thoughts?
Best Answer
Go and take a look at linuxatemyram.com. WUG is telling you what Linux thinks is used (used+buffers+cache). What you have decided to monitor (used/total) seems reasonable to me especially for a graph as it requires no knowledge of the system specifics.