Right now I am using these values:
# y = c * p / 100
# y: nagios value
# c: number of cores
# p: wanted load procent
# 4 cores
# time 5 minutes 10 minutes 15 minutes
# warning: 90% 70% 50%
# critical: 100% 80% 60%
command[check_load]=/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.4
But these values are just picked almost random.
Does anyone have some tested values?
Best Answer
Linux load is actually simple. Each of the load avg numbers are the summation of all the core's avg load. Ie.
where
0 < avg load < infinity
.So if a load is 1 on a 4 core server, then it either means each core is used 25% or one core is 100% under load. A load of 4 means all 4 cores are under 100% load. A load of >4 means the server needs more cores.
check_load
now havewhich means that when used, you can think of your server as having just one core and hence write the percent fractions directly without thinking of number of cores. With
-r
the warning and critical intervals becomes0 <= load avg <= 1
. Ie. you don't have to modify your warning and critical values from server to server.OP have 5,10,15 for intervals. That is wrong. It is 1,5,15.