Centos – Why does the samba server serve data so slowly even when said data is cached


I have a CentOS 5.6 server with the base Samba install. It has an 8 drive Raid 5 attached to an Areca 1880, hdparm tests give me back ~450MBps and I can confirm close to that with a dd to /dev/null. Serving out files to the network is a 10gbe Myricom card, and the clients are all 1gbe broadcom/intel/etc, on the same switch.

The problem I have seems to be fairly consistent throughout, single stream file copy performance is poor when the files have not been cached into memory on the server. I am talking about 9-12MB files if I copy a 1GB file, performance is through the roof, usually 90%+ on the 1gbe clients. When I copy 9-12MB files, say 100 of them, it will get roughly 40-50% usage, once they are cached on the server it will do MAYBE 55-65% usage on the 1gbe client NIC, but never more. Why?!?

I can copy multiple cached sequential file sequences and get to maybe 85%+ usage on the client NIC, why can't a single file sequence hit 95%+? All the clients are setup to offload tx/rx packets to the adapter, unfortunately they aren't using Jumbo Frames but I have tested with Jumbos turned on and still saw the same performance.

Best Answer

Large, sequential access is the best case scenario, and what you say, your system saturates the pipe in that scenario.

Many smaller files vs one big file result in metadata and possibly locking overhead. I guess that with more than one stream you get several transmiting data, and on multi-core system they can do it in parallel, increasing bandidth. This behaviour seems to indicate, that you are bound by a single-thread performance for many small files. If you can test with a SSD you'll know, if the performance problem is from disk I/O (locking, modifying access times, metadata) or are you CPU bound (you could also see a samba process eating a whole core on top display while transferring data, which would be a good indicator).

You can also play with SystemTap and see where exactly is kernel spending the time in "large file" and "many small files" scenarios.