Will running RRDtool or Cacti slow down the servers performance

cactimuninrrdtool

I'm consider running some performance profile tools like Cacti for my server, but I'm concerned that running RRDtool or Cacti slow down my servers performance? Is this a valid concern for me? Or rather, how can mitigate this? Can I run the profiling tools on a different server or something?

I suppose another tool to mention here would be Munin, and that leads me to slightly a large questions, is there any huge advantage of using Munin over Cacti? Information is a little sparse online.

Best Answer

Do you know the average load in your system ? If it's running of its limits you may have a bunch of problems. (Such as pid runout , oom-kill events etc). But if it's not too overloaded you won't have problems. Even 2% extra load it's not extremely tragic.

After all, you can reduce the "precision" of the measurements / plots and thus reduce the overhead. But, still is better to have the "server" part of the monitoring framework in a seperate server and have the clients distributed to any of your machines.

Related Solutions

Cacti rrdtool graph with no values, NaN in .rrd file

Aside from upgrading to the latest 0.8.8, have you confirmed that you (or whatever is attempting to store the rrd information in the .rrd file) can? Are permissions for the file/directory set properly to allow for this?

In reply to your comment that file/directory ownership/permissions may be the issue, I will share what I have found to be useful for my Cacti installations that run on OpenBSD (aka, your placement and syntax may vary with other systems):

chown -R www:www /var/www/htdocs/*
echo 'web directories reset to www:www ownership'
chmod -R 777 /var/www/htdocs/cacti/plugins/*
chmod -R 777 /var/www/htdocs/cacti/log/*
chmod -R 777 /var/www/htdocs/cacti/rra/*
echo 'cacti plugin, log, and rra directories set to full r,w,x'

I use weathermaps heavily in cacti and often find myself having to reset permissions over and over throughout and this has save me the headache of thinking about it constantly.

777 may be too open for these sections security-wise, and if so, I would gladly appreciate a better setting to be mentioned that still provides usability of the application.

Linux – Can someone explain the “use-cases” for the default munin graphs

Disk IOs per device (IOs/second)

With traditional hard drives this is a very important number. I/O operation is a read or write operation to disk. With rotational spindles you can get around from dozens to perhaps 200 IOPS per second, depending on the disk speed and its usage pattern.

This is not all to it: modern operating systems do have I/O schedulers which try to merge several I/O requests as one and make things faster that way. Also the RAID controllers and so on do perform some smart I/O request reordering.

Disk latency per device (Average IO wait)

How long it took from performing the I/O request to an individual disk to actually receive the data from there. If this hovers around couple of milliseconds, you are OK, if it's dozens of ms, then you are starting to see your disk subsystem sweating, if it's hundreds of more ms, you are in big trouble, or at least have a very, very slow system.

IO Service Time

How your disk subsystem (possibly containing lots of disks) is performing overall.

IOStat (blocks/second read/written)

How many disk blocks were read/written per second. Look for spikes and also the average. If average starts to near the maximum throughput of your disk subsystem, it's time to plan for performance upgrade. Actually, plan that way before that point.

Available entropy (bytes)

Some applications do want to get "true" random data. Kernel gathers that 'true' randomness from several sources, such as keyboard and mouse activity, a random number generator found in many motherboards, or even from video/music files (video-entropyd and audio-entropyd can do that).

If your system runs out of entropy, the applications wanting that data stall until they get their data. Personally in the past I've seen this happening with Cyrus IMAP daemon and its POP3 service; it generated a long random string before each login, and on a busy server that consumed the entropy pool very quickly.

One way to get rid of that problem is to switch the applications to use only semi-random data (/dev/urandom), but that's not among this topic anymore.

VMStat (running/I/O sleep processes)

Not thought about this one before, but I would think that this tells you about per-process I/O statistics, or mainly if they are running some I/O or not, and if that I/O is blocking I/O activity or not.

Disk throughput per device (bytes/second read/written)

This is purely bytes read/written per second, and more often this is more human-readable form than blocks, which may vary. Block size may differ because of the disks used, file system (and its settings) used, and so on. Sometimes the block size might be 512 bytes, other times 4096 bytes, sometimes something else.

inode table usage

With file systems having dynamic inodes (such as XFS), nothing. With file systems having static inodes maps (such as ext3), everything. If you have combination of static inodes, a huge file system and huge number of directories and small files, you might encounter a situation where you cannot create more files on that partition, even though in theory there would be lots of free space left. No free inodes == bad.

Best Answer

Related Solutions

Cacti rrdtool graph with no values, NaN in .rrd file

Linux – Can someone explain the “use-cases” for the default munin graphs

Related Topic