Linux – Debugging IO Limitation

iolinuxnetworking

I have a Fedora box with some severe IO limitations which I have no idea how to debug.

The server has a Areca Technology Corp. ARC-1130 12-Port PCI-X to SATA RAID Controller with 12 7200 RPM 1.5 TB disks and a Marvell Technology Group Ltd. 88E8050 PCI-E ASF Gigabit Ethernet Controller.

uname -a output: 2.6.32.11-99.fc12.x86_64 #1 SMP Mon Apr 5 19:59:38 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

The server is a file server running Nginx with the stub status module enabled, so I can see the current amount of connections. The problem present itself when I have a high number of simultaneous connections in a writing state. Usually around 350, at this very moment it's at 590 and the server is almost unusable and stuck at 230mbit/s.

If I run stop and hit 1 to see CPU core usages I have all 4 cores with around 99% io wait, if I run iotop the nginx workers are the only processes producing any read load, currently at around 25MB/s. I have each of the workers bound to their own core.

Initially I figured it was just the disks being bugged. But I've run fscheck and smartmontools checks and found no errors. I also ran an iozone test which you can see the result of here: http://www.pastie.org/951667.txt?key=fimcvljulnuqy2dcdxa

Additionally, when the amount of connections are low I have no problem getting a good speed. If I wget over the local network it easily hits 60MB/sec.

Right now I just tried putting a file in /dev/shm, then I symlinked a file from the public dir to it and used wget over the local network and only got 50KB/s.

Also, if I try to cp /dev/shm/test /root/test it quickly copies around 740MB and then slows down HEAVILY. Again with iotop reporting 99% iowait.

I'm not really sure how to go about figuring out what the problems are. It could be a natural disk limitation but then the file from /dev/shm ought to transfer so it seems there's a network limit, but that's fine when there's not many connections. Perhaps it's a TCP stack problem but I really have no idea how to check that.

Any suggestions on how to proceed with debugging would be very welcome. If additional information is required then let me know and I'll try to get it.

Thanks.

Best Answer

iotop is nice for seeing which processes are creating io, but I'd use sar for some more specific numbers; sar -d 10 6, for example, will give you 10 second samples across a minute period which give you much more detail on your disk performance and whether you actually have bottlenecks there (bear in mind that quite small await/svctime can have significant impacts on performance - I've seen as little as 20ms svctimes render a database server ususable, since that's 20ms per IOP the DB was trying to do).

Beyond that setting up sar's sa1 (in /etc/cron.d/sysstat) to collect more frequently than the every ten minute default and getting a full dump of the stats in gathers during busy periods (sar -A -s 09:00:00 -e 10:00:00) will give you detail on the network performance as well, and make it easy to correlate CPU, disk, network, memory behaviour to look for dodgy numbers.

(And yes, network can show up as iowait)

Related Solutions

Linux Server Performance – What Limits Maximum Number of Connections?

I finally found the setting that was really limiting the number of connections: net.ipv4.netfilter.ip_conntrack_max. This was set to 11,776 and whatever I set it to is the number of requests I can serve in my test before having to wait tcp_fin_timeout seconds for more connections to become available. The conntrack table is what the kernel uses to track the state of connections so once it's full, the kernel starts dropping packets and printing this in the log:

Jun  2 20:39:14 XXXX-XXX kernel: ip_conntrack: table full, dropping packet.

The next step was getting the kernel to recycle all those connections in the TIME_WAIT state rather than dropping packets. I could get that to happen either by turning on tcp_tw_recycle or increasing ip_conntrack_max to be larger than the number of local ports made available for connections by ip_local_port_range. I guess once the kernel is out of local ports it starts recycling connections. This uses more memory tracking connections but it seems like the better solution than turning on tcp_tw_recycle since the docs imply that that is dangerous.

With this configuration I can run ab all day and never run out of connections:

net.ipv4.netfilter.ip_conntrack_max = 32768
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_orphan_retries = 1
net.ipv4.tcp_fin_timeout = 25
net.ipv4.tcp_max_orphans = 8192
net.ipv4.ip_local_port_range = 32768    61000

The tcp_max_orphans setting didn't have any effect on my tests and I don't know why. I would think it would close the connections in TIME_WAIT state once there were 8192 of them but it doesn't do that for me.

Linux ulimit – Practical Maximum Open File Descriptors for High Volume Systems

These limits came from a time where multiple "normal" users (not apps) would share the server, and we needed ways to protect them from using too many resources.

They are very low for high performance servers and we generally set them to a very high number. (24k or so) If you need higher numbers, you also need to change the sysctl file-max option (generally limited to 40k on ubuntu and 70k on rhel) .

Setting ulimit:

# ulimit -n 99999

Sysctl max files:

#sysctl -w fs.file-max=100000

Also, and very important, you may need to check if your application has a memory/file descriptor leak. Use lsof to see all it has open to see if they are valid or not. Don't try to change your system to work around applications bugs.

Best Answer

Related Solutions

Linux Server Performance – What Limits Maximum Number of Connections?

Linux ulimit – Practical Maximum Open File Descriptors for High Volume Systems

Related Topic