Linux – RHEL 5/CentOS 5 – sshd becomes unresponsive

centosldaplinuxredhatssh

I have a number of CentOS 5.x and RHEL 5.x systems whose SSH daemons become unresponsive. This prevents remote logins.

The typical error from the connecting side is:

$ ssh db1
db1 :  ssh_exchange_identification: Connection closed by remote host

Examining /var/log/messages after a forced reboot shows the following leading up to the restart:

Dec 10 10:45:51 db1 sshd[14593]: fatal: Privilege separation user sshd does not exist
Dec 10 10:46:02 db1 sshd[14595]: fatal: Privilege separation user sshd does not exist
Dec 10 10:46:54 db1 sshd[14711]: fatal: Privilege separation user sshd does not exist
Dec 10 10:47:38 db1 sshd[14730]: fatal: Privilege separation user sshd does not exist

These systems use LDAP authentication and the nsswitch.conf file is configured to look at local "files" first.

[root@db1 ~]# cat /etc/nsswitch.conf
#
# /etc/nsswitch.conf
#

passwd:     files ldap
shadow:     files ldap
group:      files ldap

hosts:      files dns

The Privilege-separated SSH user exists in the local password file.

[root@db1 ~]# grep ssh /etc/passwd
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

Any ideas on what the root cause is? I did not see any Red Hat errata that covers this.

Best Answer

Debian bug #552431 sounds very similar.

Do the affected systems do more LDAP queries than non-affected systems? E.g., mail servers, LDAP-authenticated DB servers?

nss-ldap in EL5 is not well designed, it was replaced by nss-pam-ldapd in EL6. Do you have any EL6 machines with or without this problem?

If the problem is reproducible and you have the ability to experiment, I suggest trying sssd to replace nss_ldap and nscd. SSSD is in the RHEL/CentOS repos, available with yum. Note: sssd does not cache hosts like nscd does, if you need to cache hostnames when using sssd you should use a caching DNS server (dnsmasq is super easy for that) or use nscd to only cache hosts. SSSD does cache user/passwd/group info.