CentOS 7 server on ESX randomly dropping connections

centos7vmware-esx

We have a problem with a server on our ESX. All other machines operate normally, but not this one. It is the only Linux Server running on our ESX (all the others operate on Windows), and the only one having this problem.

It was installed 3 weeks ago and operated normally until last thursday. From this day on it started dropping connections to specific hosts randomly. For example, I am working with a web interface on the installed software and an open SSH connection (for viewing the logs). Suddenly my browser and my SSH connection are dropping with "Connection refused" and I am not able to reconnect, although ping is working. For my colleague, everything works. Later I am able to connect again and my colleague is not. It seems as if only 2-3 people are able to connect simultaneously to the server.

The server has got a static IP address and there is a static lease in our DNS (Microsoft Active Directory based).

Applied configurations during product installation:

ulimit -n 8800

echo "* soft stack 32768" >> /etc/security/limits.conf
echo "* hard stack 32768" >> /etc/security/limits.conf
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf
echo "* soft nproc 16384" >> /etc/security/limits.conf
echo "* hard nproc 16384" >> /etc/security/limits.conf

Firewall was turned off (service firewalld stop), this did not change anything. I am not seeing anything in the messages logfile.

Installed software:

  • Cent OS 7
  • IBM Business Process Server Advanced 8.5.6 (Based on IBM WebSphere)
  • IBM DB2 Express

I am a developer with basic network and Linux knowledge, but I am running out of ideas here. Are there any logs you would suggest me to check? How can I debug this system?

Best Answer

Well, existing connection cannot be dropped with "Connection refused", it's likely "Connection Reset". What happens with the new connection you're trying to establish during the outage - does it timeout or refused immediately? Anyway this behavior resembles an IP address conflict with some other network device to me..

Related Topic