What does cpu_aidle mean in Ganglia reports

ganglia

The cpu_aidle graph mysteriously flat at roughly 12 percent on all machines reporting on a friend's cluster. Given how everything else CPU related is spiky, this seems unusual. Can anyone shed light on what that number means?

Best Answer

According to the ganglia readme aidle is "Percent of time since boot idle CPU". And it says this is only available for Linux.

Related Solutions

Setting up Ganglia for multiple clusters

My server with the web frontend and gmetad has gmetad.conf with one gridname, plus one data_source entry for each cluster. Each data_source draws from one or more systems in the cluster:

gridname "The Grid"
data_source "Infrastructure" ihost1 ihost2 ...
data_source "Compute Nodes" chost1 chost2 ...
data_source "Workstations" work1 work2 ...

Each host registers itself into a particular cluster, and onto a cluster-specific multicast address in its gmond.conf:

cluster {
  name = "Infrastructure"
}
udp_send_channel {
  mcast_join = 239.2.11.72
  port = 8649
}
udp_recv_channel {
  mcast_join = 239.2.11.72
  port = 8649
  bind = 239.2.11.72
}
tcp_accept_channel {
  port = 8649
}

Ganglia – How to Fix Graph Update Issues

I've been facing with this problem when Ganglia is installed on Ubuntu. According to the document, it sounds like gmond lost metadata and doesn't know what to do with the metric data. Since you're setting up Ganglia in unicast mode, you need to instruct gmond to periodically send metadata by changing send_metadata_interval to a non-zero value:

globals {
  daemonize = yes
  setuid = yes
  user = ganglia
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  allow_extra_data = yes
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30 /*secs */
}

Give it a try!

http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_release_notes

3.1 collectors will request a gmond to resend its metric description information if needed and if using multicast, if you are using unicast there is no way to do that yet and so if you restart your collector will be left with partial or no data from the cluster that is being collected through it untill all gmond in that cluster are restarted. To workaround this problem if using unicast setup send_metadata_interval to a reasonable value so that all gmond resent their metadata periodically to the collector in case it gets lost.

http://sourceforge.net/apps/trac/ganglia/wiki/FAQ

In recent versions of gmond (3.1.x), a new global variable was added in gmond.conf called send_metadata_interval, with a default setting of 0. Purpose was to reduce network traffic. In 3.1 metric data is sent separately from metadata e.g. metadata contains detailed description, grouping, other possible setting. A value of zero means that the gmond will send metadata when it starts, and no other time (which is consistent with older versions of ganglia).

If you plan on using unicast mode, please set send_metadata_interval to something other than 0. 30-60 seconds has been found to work reliably in most cases. Setting this variable to a non-zero value will make the gmond processes periodically announce their metrics and the graphs will reappear on the host-view page.

Best Answer

Related Solutions

Setting up Ganglia for multiple clusters

Ganglia – How to Fix Graph Update Issues

Related Topic