Samba stuck at maximum of 1024 open files

filesystemsnetwork-sharesambaulimit

I'm running a Ubuntu 10.04 (lucid) samba fileserver. I have a Windows 7 client which opens a large number of files while doing a copy of thousands of tiny files at once. It receives the error "Too many open files" at which point waiting a few seconds and clicking "Try again" resumes the download.

I've found a number of references that say to increase the number of open files available to Samba to solve the problem. I think that's a great idea, and am trying desperately to do so… but no matter what I do, it refuses to ever open more than 1024 files, and the copy issue will not go away!

Here is what I've tried:

I've set ulimit -n 25000.

I've also set /etc/security/limits.conf to:

* soft nofiles 25000
* hard nofiles 65000
root soft nofiles 25000
root hard nofiles 65000

I've ensured there is nothing in /etc/security/limits.d that overrides any of this.

I've checked that sysctl fs.file-max = 199468 which is more than adequate.

I cannot find any apparmor profiles that might be interfering with samba.

I have added a limit nofile 25000 65000 stanza to /etc/init/smbd.conf

I've set max open files = 50000 in smb.conf and confirmed that it is taking effect via the samba log files:

[2011/10/28 01:30:16,  0] smbd/open.c:151(fd_open)
  Too many open files, unable to open more!  smbd's max open files = 50000
[2011/10/28 01:30:18,  0] lib/sysquotas.c:426(sys_get_quota)
  sys_path_to_bdev() failed for path [.]!
[2011/10/28 01:30:18,  0] lib/sysquotas.c:426(sys_get_quota)
  sys_path_to_bdev() failed for path [.]!
[2011/10/28 01:30:18,  0] smbd/open.c:151(fd_open)
  Too many open files, unable to open more!  smbd's max open files = 50000
[2011/10/28 01:30:19,  0] smbd/open.c:151(fd_open)
  Too many open files, unable to open more!  smbd's max open files = 50000
[2011/10/28 01:30:20,  0] smbd/open.c:151(fd_open)
  Too many open files, unable to open more!  smbd's max open files = 50000

I've confirmed that the issue happens when around 1000 files are open by using lsof | wc -l on the disk to give me an approximate count. No matter what I change, it's always 1000 where the "Try again" button appears and the copy gets interrupted. As soon as it drops back below 1000, you can click try again and it will resume copying.

Obviously this is a bug in Windows 7 or in Samba, I don't care which, all I care about is fixing it. Why won't my Samba open more than 1000-or-so files like I am asking it to do in so many ways? Is there some other limit I need to change?

Edit: symcbean had a good suggestion. Here are the results from inserting ulimit -a > /tmp/samba-ulimits into the pre-script section of /etc/init/smb.conf

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        10240
coredump(blocks)     0
memory(kbytes)       unlimited
locked memory(kbytes) 64
process              15969
nofiles              25000
vmemory(kbytes)      unlimited
locks                unlimited

Also, I am running version 2:3.4.7~dfsg-1ubuntu3 of samba.

Best Answer

Ok I have solved my issue, and in doing so come to a better understanding of how the ulimits work, at least in Ubuntu. There were a number of issues and I think I have sorted them all out.

First problem, and a silly one: nofiles should be nofile in /etc/security/limits.conf

Another more significant oversight: While I had ensured pam_limits.so was included in /etc/pam.d/common-session, I didn't notice that there was also /etc/pam.d/common-session-noninteractive. The latter file was the one that samba was using.

Fixing that issue appears to have fixed samba, which can now open as many file descriptors as it likes. Windows copies complete successfully. Also note: Samba does indeed use the appropriate user's ulimit, not the ulimits the smbd process started with, nor root's ulimit. /etc/security/limits.conf is the place to set this, once you have properly configured either (both?) /etc/pam.d/common-session-noninteractive and /etc/pam.d/samba to use pam_limits.so

As for the other issue, where my user was stuck at 1024 hard/1024 soft limits, that was a combination of a few issues. First and foremost, despite having /etc/pam.d/sshd the ssh daemon does not use PAM unless you modify /etc/ssh/sshd_config to have "UsePAM yes". The default is "no", and without using PAM, pam_limits.so (which is responsible for applying limits.conf) does not even come into play.

Instead, the default ulimits for non-PAM logins seem to inherit from pid 1 (typically "init"). You can check those default pid 1 limits with cat /proc/1/limits. Unfortunately, as far as I can tell, those limits are set as defaults in the kernel. There does not seem to be any way to modify them short of recompiling the kernel, or convincing the non-PAM application to use PAM.

I also just want to offer the advice that cat /proc/<anypid>/limits is a great way to debug the limits of any specific process you might be having trouble with. I wish I had discovered that sooner.