Ubuntu 11.04 server hanging due to landscape-sysinfo CPU overconsumption

cpu-usagelandscapeUbuntu

I'm running something of a bare-bones server (based on Ubuntu 11.04) on an Amazon EC2 micro instance, whose purpose is simply to coordinate the activities of a few webservers. The machine ran well for a few weeks, but now is hanging frequently with its CPU redlined at 100%.

I logged into the machine over SSH and ran a top, which revealed that the landscape-sysinfo process was the perpetrator consuming all of the system resources. A pstree revealed where it was situated:

init─┬─atd
     ├─cron
     ├─dhclient3
     ├─dovecot─┬─2*[dovecot-auth]
     │         ├─3*[imap-login]
     │         └─3*[pop3-login]
     ├─6*[getty]
     ├─master─┬─pickup
     │        └─qmgr
     ├─mountall
     ├─mysqld───11*[{mysqld}]
     ├─rsyslogd───3*[{rsyslogd}]
     ├─sshd─┬─sshd───sshd───bash
     │      ├─sshd───sshd───bash───top
     │      ├─sshd───sshd───bash───pstree
     │      └─sshd───sh───run-parts───50-landscape-sy───landscape-sys+
     ├─udevd───2*[udevd]
     ├─upstart-socket-
     ├─upstart-udev-br
     └─vsftpd

The offending process is listed here as the last child of sshd. If I manually kill landscape-sysinfo, the machine returns to normal – until the process spontaneously respawns, usually a few moments later. (I can "vouch for" the other sshd processes in the above tree. They were legitimate.)

I have no idea why landscape-sysinfo is spawning itself randomly. I doubly have no idea why it's the child of sshd.

I'm obviously none too thrilled about having an SSH processes running on my machine that I can't account for. Initially I feared a breach/trojan/backdoor, so I ran chkrootkit and rkhunter, but they both came up clean.

Does anybody have any idea what could be causing this process to run wild? Any thoughts on how to stop it from respawning?

Best Answer

I figured out the actual cause of the problem a while back, and figured I should document it here for the sake of others who may have similar issues. The root cause turned out to be trickier and more complicated than I initially expected.

In short, run-parts was working fine all along. Its going haywire was just the symptom of a different problem. The failure-chain looked something like this:

1) On an entirely different machine, lsyncd (a file-syncing utility based off of rsync) was running haywire for reasons beyond our concern here. Of our concern, though, is that lsyncd was trying to sync files against this micro-instance (which manifested the problems) over SSH.

2) Because lsyncd was making dozens of simultaneous connections over SSH, each was seemingly being greeted with the SSH login banner landscape-sysinfo Ubuntu provides by default. This explains what landscape-sysinfo is and why it is a child of SSH. It appeared that run-parts was the culprit, but in fact the issue was that the machine was being bombarded with SSH connections.

3) Exacerbating the issue was that this is a micro-instance on EC2, and I've since discovered that Amazon severely throttles micro-instances whose CPU consumption steadily rides above a certain threshold. (For an excellent explanation of the details, please see Greg's Ramblings. Many thanks to Greg for that article!)

Thus, the machine ran slowly for a few moments while it was being bombarded SSH connections, and then became unusably slow after the throttling kicked in.

Mystery solved!