I have a number of CentOS 5.x and RHEL 5.x systems whose SSH daemons become unresponsive. This prevents remote logins.
The typical error from the connecting side is:
$ ssh db1
db1 : ssh_exchange_identification: Connection closed by remote host
Examining /var/log/messages
after a forced reboot shows the following leading up to the restart:
Dec 10 10:45:51 db1 sshd[14593]: fatal: Privilege separation user sshd does not exist
Dec 10 10:46:02 db1 sshd[14595]: fatal: Privilege separation user sshd does not exist
Dec 10 10:46:54 db1 sshd[14711]: fatal: Privilege separation user sshd does not exist
Dec 10 10:47:38 db1 sshd[14730]: fatal: Privilege separation user sshd does not exist
These systems use LDAP authentication and the nsswitch.conf
file is configured to look at local "files" first.
[root@db1 ~]# cat /etc/nsswitch.conf
#
# /etc/nsswitch.conf
#
passwd: files ldap
shadow: files ldap
group: files ldap
hosts: files dns
The Privilege-separated SSH user exists in the local password file.
[root@db1 ~]# grep ssh /etc/passwd
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
Any ideas on what the root cause is? I did not see any Red Hat errata that covers this.
Best Answer
Debian bug #552431 sounds very similar.
Do the affected systems do more LDAP queries than non-affected systems? E.g., mail servers, LDAP-authenticated DB servers?
nss-ldap in EL5 is not well designed, it was replaced by nss-pam-ldapd in EL6. Do you have any EL6 machines with or without this problem?
If the problem is reproducible and you have the ability to experiment, I suggest trying sssd to replace nss_ldap and nscd. SSSD is in the RHEL/CentOS repos, available with yum. Note: sssd does not cache hosts like nscd does, if you need to cache hostnames when using sssd you should use a caching DNS server (dnsmasq is super easy for that) or use nscd to only cache hosts. SSSD does cache user/passwd/group info.