Collectd and graphite imports data every 5 minutes, rather than 1 minute


I'm a bit new to graphite, so bear with me on this. I'm looking into alternatives for a large and fairly unwieldy cacti installation, so I've been playing with graphite. We pull a lot of data via SNMP, so I've also downloaded, compiled and installed collectd to pipe SNMP data into graphite.

I've set up a simple query within collectd to just grab the current eth0 in/out counters. I'm looking to capture at a minute's resolution for a week, followed by 5 minutes thereafter, so my storage-schemas.conf looks like this:

 pattern = ^carbon\.
 retentions = 60:90d

 pattern = .*
 retentions = 60s:1w, 5m:1y

Similarly, in collectd.conf I have set the following:

<Plugin snmp>
   <Data "std_traffic">
       Type "if_octets"
       Table true
       Instance "IF-MIB::ifDescr"
       Values "IF-MIB::ifInOctets" "IF-MIB::ifOutOctets"

   <Host "lonsbrndlb01">
       Address "lonsbrndlb01"
       Version 2
       Community "public"
       Collect "std_traffic"
       Interval 60

This almost works perfectly. The keys appear in graphite, and data comes in.

The only problem is that the data is a counter, and not a per-minute rate. I can get around this in graphite by using the derivative function, which supposedly turns counters into per-minute rates. However, doing this, I see this graph:

This is fairly evident that the data's only arriving every 5 minutes, and not every 60 seconds as I specified. Why is this? I thought I'd set the right values in both collectd and graphite, so I think I'm missing something somewhere.


Some more data on this, as it might be useful.

The formulas I'm using are just derivative(lonsbrndlb01.snmp.if_octets-eth0.tx) and derivative(lonsbrndlb01.snmp.if_octets-eth0.rx), although I've now switched to using nonNegativeDerivative because of counter rollovers. I've also updated the image below to give a sense of scale.

Running on the rx.wsp file gives a header of:

Meta data:
  aggregation method: average
  max retention: 31536000
  xFilesFactor: 0.5

Archive 0 info:
  offset: 40
  seconds per point: 60
  points: 10080
  retention: 604800
  size: 120960

Archive 1 info:
  offset: 121000
  seconds per point: 300
  points: 105120
  retention: 31536000
  size: 1261440

followed by about 2.4M of data.

Data from the graph by appending &format=json is:

[{"target": "nonNegativeDerivative(lonsbrndlb01.snmp.if_octets-eth0.rx)", "datapoints": [[null, 1342597800], [26346975.0, 1342597860], [35197821.0, 1342597920], [138121.0, 1342597980], [108605.0, 1342598040], [690712.0, 1342598100], [27213713.0, 1342598160], [876898.0, 1342598220], [463897.0, 1342598280], [137499.0, 1342598340], [96980.0, 1342598400], [26237641.0, 1342598460], [35094898.0, 1342598520], [112569.0, 1342598580], [274897.0, 1342598640], [139174.0, 1342598700], [806881.0, 1342598760], [26206311.0, 1342598820], [112298.0, 1342598880], [781205.0, 1342598940], [606872.0, 1342599000], [5184462.0, 1342599060], [61946135.0, 1342599120], [4126005.0, 1342599180], [115908.0, 1342599240], [714159.0, 1342599300], [195738.0, 1342599360], [26261781.0, 1342599420], [100503.0, 1342599480], [751322.0, 1342599540], [930865.0, 1342599600], [230666.0, 1342599660], [59636.0, 1342599720], [62575579.0, 1342599780], [104950.0, 1342599840], [1208886.0, 1342599900], [379369.0, 1342599960], [785827.0, 1342600020], [26215475.0, 1342600080], [221604.0, 1342600140], [351866.0, 1342600200], [231163.0, 1342600260], [211398.0, 1342600320], [70770807.0, 1342600380], [429324.0, 1342600440], [1937893.0, 1342600500], [1476961.0, 1342600560], [72383.0, 1342600620], [371513.0, 1342600680], [29186024.0, 1342600740], [1924055.0, 1342600800], [280068.0, 1342600860], [341216.0, 1342600920], [36643885.0, 1342600980], [26708952.0, 1342601040], [259828.0, 1342601100], [488406.0, 1342601160], [230698.0, 1342601220], [766407.0, 1342601280], [28252848.0, 1342601340]]}, {"target": "nonNegativeDerivative(lonsbrndlb01.snmp.if_octets-eth0.tx)", "datapoints": [[null, 1342597800], [26007032.0, 1342597860], [34808859.0, 1342597920], [100498.0, 1342597980], [91818.0, 1342598040], [649666.0, 1342598100], [26566941.0, 1342598160], [895897.0, 1342598220], [478867.0, 1342598280], [100242.0, 1342598340], [81130.0, 1342598400], [25908859.0, 1342598460], [34659481.0, 1342598520], [75295.0, 1342598580], [285061.0, 1342598640], [103644.0, 1342598700], [824177.0, 1342598760], [25884962.0, 1342598820], [93420.0, 1342598880], [799160.0, 1342598940], [582373.0, 1342599000], [5024696.0, 1342599060], [61269813.0, 1342599120], [3336907.0, 1342599180], [436657.0, 1342599240], [696692.0, 1342599300], [182144.0, 1342599360], [25947578.0, 1342599420], [79011.0, 1342599480], [733857.0, 1342599540], [1015395.0, 1342599600], [184960.0, 1342599660], [48026.0, 1342599720], [61462810.0, 1342599780], [89187.0, 1342599840], [1195360.0, 1342599900], [386772.0, 1342599960], [744445.0, 1342600020], [25913548.0, 1342600080], [201978.0, 1342600140], [344650.0, 1342600200], [199421.0, 1342600260], [208959.0, 1342600320], [69924581.0, 1342600380], [381593.0, 1342600440], [1610764.0, 1342600500], [1484192.0, 1342600560], [41585.0, 1342600620], [373375.0, 1342600680], [28478208.0, 1342600740], [1893711.0, 1342600800], [253921.0, 1342600860], [354558.0, 1342600920], [36199040.0, 1342600980], [26395675.0, 1342601040], [239238.0, 1342601100], [477775.0, 1342601160], [212554.0, 1342601220], [752374.0, 1342601280], [27890202.0, 1342601340]]}]

It may be peaky data, but there's no way this box is peaking at 60MBit traffic every few minutes.

Best Answer

If you use the command on the appropriate whisper file, what does it show? It looks like it's not exactly every 5 minutes from the graph. Is it at all possible that you're just getting spikey network traffic? Also, for counters, it's always a good idea to use nonNegativeDerivative instead of Derivative since the nonNegative version accounts for rollover.