Sensu alternative (?) where alarm thresholds defined on server (not monitored client)

monitoringrabbitmq

Question/TLDR;

Is there a Sensu -alternative (i.e operating system monitoring agent/server based on RabbitMQ) that defines its alarm thresholds on the central monitoring server and not on the monitored client server (as Sensu and Nagios do)?

RabbitMQ is required so no Zabbix et al, I'm afraid.

Background:

I have a large environments (Windows and RHEL) where I can't install orchestration tools (Puppet et al) and the amount of installed services should therefore be kept to a minimum.

I'm researching if I could develop a single agent that collects system information, relays logs (to Logstash) and reports on resource consumption.
It would push all these values in to RabbitMQ and then Logstash could subscribe to the logs, a monitoring service could subscribe to the resource metrics (and create alarms from them), a CMDB system could subscribe to the system information etc.

However, I would want to just receive the information about resource consumption and create the alarms on the monitoring server and not have to change the thresholds on each server to change the alarm threshold.

I can't be the only person to find an agent like that useful…

Clarification:

If a server under Sensu monitoring is running out of disk, the Sensu agent checks the disk space, compares it against the CRITICAL threshold that's defined on that server and if the threshold is passed, a CRITICAL alarm is sent through RabbitMQ to the central monitoring server.
To change the threshold without Puppet or somesuch, logging in to the server is required (right?)

The way I'd like this to work is that when a monitoring agent checks its disk space, it just sends the amount of available disk (or used disk and total etc) through RabbitMQ to the central server which then compares that value against the threshold defined on the central server and, if necessary, sends an alarm.

If the threshold needs to be changed, it's changed on the central server or multiple values from multiple servers can be compared to create an alarm.

This is kinda my main issue with Sensu, although I understand the decision to go with Nagios compatibility.

It would also be preferable if no central server -> monitored server traffic would be required. I suppose a kludge could be made where the central server sends the thresholds to the agent which then runs them as "local". The network for the environment makes this exceptionally tricky.

Thanks for any ideas anyone might have!

Best Answer

Using open source components, I'd use the following components (if you indeed do need to send metrics via RabbitMQ):

use collectd on the client side to send metrics into RabbitMQ with its AMQP plugin
consume the messages from RabbitMQ using graphite-amqp-tools and send them into Graphite

Now you have the metrics in Graphite, you can query it for your resource consumption. In my $WORK's environment, we have checks which query Graphite, with the alerting thresholds set on Nagios server. But now that you have Graphite (is has a http interface for querying which can return graphs, json, csv & plain text results) you could build/use anything as long as it can query Graphite.

Related Solutions

Can OpenNMS show me all currently exceeded thresholds

It seems other people have had the same itch to scratch: OpenNMS Enhancement request, and subsequent blog post about it. I haven't installed this, so can't comment on how easy it is to add to the current 1.8 branch.

Linux: logwatch(8) is too noisy. How to control the noise level

Overall, the available documentation for Logwatch lacks adequate explanation and is often far too vague. I pieced together some useful examples, and have reduced the Logwatch noise by over 95%.

Here's what I have found.

Keep in mind that you can find some Logwatch documentation at /usr/share/doc/logwatch-*/HOWTO-Customize-LogWatch, and it contains a few useful examples.

On RHEL/CentOS/SL, the default logwatch configuration is under /usr/share/logwatch/default.conf/logwatch.conf

These settings can be overriden by placing your local configuration under /etc/logwatch/conf/logwatch.conf. Place the following in that file to tell logwatch to completely ignore services like 'httpd' and the daily disk usage checks:
```
# Don't spam about the following Services
Service = "-http"
Service = "-zz-disk_space"
```
Sometimes I don't want to completely disable logwatch for a specific service, I just want to fine tune the results to make them less noisy. /usr/share/logwatch/default.conf/services/*.conf contains the default configuration for the services. These parameters can be overridden by placing your local configuration under /etc/logwatch/conf/services/$SERVICE.conf. Unfortunately, logwatch's ability here is limited, and many of the logwatch executables are full of undocumented Perl. Your choice is to replace the executable with something else, or try to override some settings using /etc/logwatch/conf/services.

For example, I have a security scanner which runs scans across the network. As the tests run, the security scanner generates many error messages in the application logs. I would like logwatch to ignore errors from my security scanners, but still notify me of attacks from other hosts. This is covered in more detail at Logwatch: Ignore certain IPs for SSH & PAM checks?. To do this, I place the following under /etc/logwatch/conf/services/sshd.conf:
```
# Ignore these hosts
*Remove = 192.168.100.1
*Remove = X.Y.123.123
# Ignore these usernames
*Remove = testuser
# Ignore other noise. Note that we need to escape the ()
*Remove = "pam_succeed_if$sshd:auth$: error retrieving information about user netscan.*
```
"
logwatch also allows you to strip out output from the logwatch emails by placing regular expressions in /etc/logwatch/conf/ignore.conf. HOWTO-Customize-LogWatch says:

ignore.conf: This file specifies regular expressions that, when matched by the output of logwatch, will suppress the matching line, regardless of which service is being executed.

However, I haven't had much luck with this. My requirements need a conditional statement, which is something like 'If there are security warnings due to my security scanner, then don't print the output. But if there are security warnings from my security scanner and from some bad guys, then print the useful parts-- The header which says "Failed logins from:", the IPs of the bad hosts, but not the IPs of scanners.'
Nip it at the source (As suggested by @user48838). These messages are being generated by some application, and then Logwatch is happily spewing the results to you. In these cases, you can modify the application to log less.

This isn't always desirable, because sometimes you want the full logs to be sent somewhere (to a Central syslog server, central IDS server, Splunk, Nagios, etc.), but you don't want logwatch to email you about this from every server, every day.

Best Answer

Related Solutions

Can OpenNMS show me all currently exceeded thresholds

Linux: logwatch(8) is too noisy. How to control the noise level

Related Topic