Linux – How to configure a redhat/centos/amazon linux server for 1million open tcp connections

amazon ec2linuxredhattcp

I've been reading the following article:

And wondering if there's anything else i need to know about tuning linux to handle 1million tcp connections?
So far i've narrowed it down to the following:

  • Configuring the kernel to support 1mil connections, system wide (sysctl.conf)
  • Configuring to have 1mil connections for the specific user (/etc/security/limits.conf)
  • Configuring tcp stack memory settings (sysctl.conf?)

Is there anything else i need to configure? (this is for an EC2 large 64-bit server)

-edit-

It's not apache, its a libev-based custom coded C server FYI. It'll scale to 1mil just fine, its the kernel that's my worry 🙂

Best Answer

You have most of the tunables configured that I would have set (and had to set). One thing I found when we scaling like this was that you will always have something special to your environment that no one else mentioned. To catch this you need to make sure you are watching and alerting on:

  • errors via syslog
  • errors your program sees like socket() failures, etc
  • network buffer availability (via SNMP or netstat cron)
  • kernel table limits (again via SNMP or /proc file parsing crons)
  • frequent monitoring (very lightweight polls done every 1-10ms, we use OpenNMS which does this really easily, because OpenNMS is awesome).

One other thing you might run into is issues with the HZ value. On our FreeBSD systems we increased this. I was investigating another question on linux and ran into a case where the socket queues are cleaned in relation to the HZ value:

TIME_WAIT connections not being cleaned up after timeout period expires

Regarding the comment I don't think FreeBSD specifically will be any better at this, they both need massive amounts of tuning to work. We are using FreeBSD because the boxes directly connect to the internet and OpenBGPD is currently the best open source BGP implementation available.