The cpu_aidle graph mysteriously flat at roughly 12 percent on all machines reporting on a friend's cluster. Given how everything else CPU related is spiky, this seems unusual. Can anyone shed light on what that number means?
What does cpu_aidle mean in Ganglia reports
ganglia
Related Solutions
My server with the web frontend and gmetad has gmetad.conf with one gridname
, plus one data_source
entry for each cluster. Each data_source
draws from one or more systems in the cluster:
gridname "The Grid"
data_source "Infrastructure" ihost1 ihost2 ...
data_source "Compute Nodes" chost1 chost2 ...
data_source "Workstations" work1 work2 ...
Each host registers itself into a particular cluster, and onto a cluster-specific multicast address in its gmond.conf
:
cluster {
name = "Infrastructure"
}
udp_send_channel {
mcast_join = 239.2.11.72
port = 8649
}
udp_recv_channel {
mcast_join = 239.2.11.72
port = 8649
bind = 239.2.11.72
}
tcp_accept_channel {
port = 8649
}
I've been facing with this problem when Ganglia is installed on Ubuntu. According to the document, it sounds like gmond lost metadata and doesn't know what to do with the metric data. Since you're setting up Ganglia in unicast mode, you need to instruct gmond to periodically send metadata by changing send_metadata_interval
to a non-zero value:
globals {
daemonize = yes
setuid = yes
user = ganglia
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 0 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 30 /*secs */
}
Give it a try!
Read more:
http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_release_notes
3.1 collectors will request a gmond to resend its metric description information if needed and if using multicast, if you are using unicast there is no way to do that yet and so if you restart your collector will be left with partial or no data from the cluster that is being collected through it untill all gmond in that cluster are restarted. To workaround this problem if using unicast setup
send_metadata_interval
to a reasonable value so that all gmond resent their metadata periodically to the collector in case it gets lost.
http://sourceforge.net/apps/trac/ganglia/wiki/FAQ
In recent versions of gmond (3.1.x), a new global variable was added in
gmond.conf
calledsend_metadata_interval
, with a default setting of 0. Purpose was to reduce network traffic. In 3.1 metric data is sent separately from metadata e.g. metadata contains detailed description, grouping, other possible setting. A value of zero means that the gmond will send metadata when it starts, and no other time (which is consistent with older versions of ganglia).If you plan on using unicast mode, please set
send_metadata_interval
to something other than 0. 30-60 seconds has been found to work reliably in most cases. Setting this variable to a non-zero value will make the gmond processes periodically announce their metrics and the graphs will reappear on the host-view page.
Best Answer
According to the ganglia readme aidle is "Percent of time since boot idle CPU". And it says this is only available for Linux.