Linux – apcupsd slave client keeps loosing and restoring communications with UPS master

linuxupsvmware-esxi

On a VMWare ESXI 5.0.0 (vSphere Hypervisor – the free version) I have three server images. All running CentOS 6 – Linux.
All are configured to run the apcupsd ( http://www.apcupsd.org/ ) daemon for controlling APC upses.

One of the servers (master) is connected, using a USB cable to an APC CS 350 UPS.
apcupsd is configured to have the netserver available on port 3551.

The two other (also virtualized) servers have apcupsd configured to retrieve the UPS status from master.

It works, but i i see lots of warnings coming from apcupsd on the two slaves. In a terminal window I see entries saying

Broadcast message from root@slavehostname (Thu Nov 1 19:55:10 2012):

Warning communications lost with UPS masterhostname

Broadcast message from root@slavehostname (Thu Nov 1 19:55:47 2012):

Communications restored with UPS masterhostname

On the same day I see about 200 sets of lost/restored messages. They are a lot more frequent during the day than during the night.

I don't get any warnings on the master.

These servers have lots of memory and CPU available to them. Practically no swapping taking place.
I don't think that they are starved. And generally they do not do very much work.

This is the master configuration settings (leaving out the EPROM settings):

UPSCABLE usb
UPSTYPE usb
DEVICE
POLLTIME 10
LOCKFILE /var/lock
SCRIPTDIR /etc/apcupsd
PWRFAILDIR /etc/apcupsd
NOLOGINDIR /etc
ONBATTERYDELAY 6
BATTERYLEVEL 5
MINUTES 3
TIMEOUT 0
ANNOY 300
ANNOYDELAY 60
NOLOGON disable
KILLDELAY 0
NETSERVER on
NISIP 0.0.0.0
NISPORT 3551
EVENTSFILE /var/log/apcupsd.events
EVENTSFILEMAX 10
UPSCLASS standalone
UPSMODE disable
STATTIME 0
STATFILE /var/log/apcupsd.status
LOGSTATS off
DATATIME 0

And this is the slave settings:

UPSCABLE ether
UPSTYPE net       
DEVICE 192.168.0.59:3551
POLLTIME 10
LOCKFILE /var/lock
SCRIPTDIR /etc/apcupsd
PWRFAILDIR /etc/apcupsd
NOLOGINDIR /etc
ONBATTERYDELAY 12
BATTERYLEVEL 10
MINUTES 7
TIMEOUT 0
ANNOY 300
ANNOYDELAY 60
NOLOGON disable
KILLDELAY 0
NETSERVER on
NISIP 0.0.0.0
NISPORT 3551
EVENTSFILE /var/log/apcupsd.events
EVENTSFILEMAX 10
UPSCLASS standalone
UPSMODE disable
STATTIME 20
STATFILE /var/log/apcupsd.status
LOGSTATS off
DATATIME 0

I would like to ask for help on how to move on from here. How do I debug this? Any suggestions on how I might have configured my servers in a way that could cause this.

Best Answer

This doesn't fix the underlying problem, but it helps clean up the console a bit:

The script that outputs these messages is called apccontrol, and in my Ubuntu 12.04.02 LTS boxen it lives in /etc/apcupsd. It uses wall for all the messages.

But it also calls other scripts if they exist in that directory to do secondary handlings, like emailing root every time there's a comms failure. You can turn that off by moving the script or changing it.

Also: if the other script exits with status code 99, then apccontrol will not call the default action, and you won't get spam on your wall.

I've just used it to push all the comms loss alerts into syslog instead of wall, and now it doesn't clutter up all my terminals that I'm trying to use. And I can put the polltime back down to the default of 60 so my slave box will still notice if the UPS kicks in.

Related Solutions

Linux – Apache server keeps crashing regularly

I'm not saying this is what's happening but based on my own experience as a CentOS admin, it's most likely runaway apache/php processes taking down the server. I've seen this numerous times on CentOS 5. It's frustrating because there's usually not a trace of what happened in the log files. The machine just grinds to a halt due to physical memory and swap being sucked up by apache/php processes. You would think linux memory management or some daemon would jump in and say "hey stop" but it doesn't. It'll let apache grind your system to a halt.

Having said that, to see what's happening you'll need something that can monitor and log resource usage. I like to use a program called atop. Atop is a lot like the top program but it also takes a snapshot of resource usage at defined intervals. It's pretty simple to install.

wget http://www.atcomputing.nl/Tools/atop/packages/atop-1.23.tar.gz 
tar -zxvf atop-1.23.tar.gz
cd atop-1.23 && make install

Open /etc/atop/atop.daily with a text editor and change INTERVAL=600 to INTERVAL=60

Run the command /etc/atop/atop.daily from a command prompt to start it. Wait a few minutes and run atop -r /var/log/atop/atop_20091118 with the correct date of course.

Hit the t key to go forward in time and T to go back. Next time your server crashes do this and check the MEM free and SWP free lines. If you have memory problems these will be in red. Also look for numerous httpd lines under CMD. If apache/php is your problem there'll be a bunch of them.

If this is the case, I recommend looking at you're MaxClients setting in httpd.conf. If set too high, apache will gladly eat all of your memory causing your machine to crash. Apache/php can easily eat 40-50MB/process. If you multiply 40mb x MaxClients you'll get a rough idea of how much memory apache can potentially use. MaxClients usually defaults to 150 on CentOS so apache can potentially use 6GB of memory by default. This doesn't include memory your system needs for itself and other processes to run. Try setting it to a more realistic value based on the amount of memory you have like 40 if you have 2G of memory and see if that helps. Also if you have KeepAlive On, set KeepAliveTimeout to a low number like 2 or 3.

In my opinion CentOS's apache/php compilation is a real pos that should never have seen the light of day. It's buggy and crash prone. If you run a serious site, I highly recommend compiling your own version of apache/php or even using one of the newer high performance webservers like lighttpd or nginx with fgci php.

FreeNAS NUT Slave and not master

The original poster may have found an answer by now, but just for the sake of others who visit this thread, NUT master/slave is merely a matter of configuration. Install NUT per the appropriate OS then configure one, and only one, machine to the master. The others will be set as slaves and provided the ip address of the master.

Check out this link: http://www.networkupstools.org/

Best Answer

Related Solutions

Linux – Apache server keeps crashing regularly

FreeNAS NUT Slave and not master

Related Topic