Linux Routing – Tuning IP Routing Parameters: secret_interval and tcp_mem

linuxroutingtcpip

We had a little failover problem with one of our HAProxy VMs today. When we dug into it, we found this:

Jan 26 07:41:45 haproxy2 kernel: [226818.070059] __ratelimit: 10 callbacks suppressed
Jan 26 07:41:45 haproxy2 kernel: [226818.070064] Out of socket memory
Jan 26 07:41:47 haproxy2 kernel: [226819.560048] Out of socket memory
Jan 26 07:41:49 haproxy2 kernel: [226822.030044] Out of socket memory

Which, per this link, apparently has to do with low default settings for net.ipv4.tcp_mem. So we increased them by 4x from their defaults (this is Ubuntu Server, not sure if the Linux flavor matters):

current values are:    45984   61312   91968
new values are:       183936  245248  367872

After that, we started seeing a bizarre error message:

Jan 26 08:18:49 haproxy1 kernel: [ 2291.579726] Route hash chain too long!
Jan 26 08:18:49 haproxy1 kernel: [ 2291.579732] Adjust your secret_interval!

Shh.. it's a secret!!

This apparently has to do with /proc/sys/net/ipv4/route/secret_interval which defaults to 600 and controls periodic flushing of the route cache

The secret_interval instructs the kernel how often to blow away ALL route
hash entries regardless of how new/old they are. In our environment this is
generally bad. The CPU will be busy rebuilding thousands of entries per
second every time the cache is cleared. However we set this to run once a
day to keep memory leaks at bay (though we've never had one).

While we are happy to reduce this, it seems odd to recommend dropping the entire route cache at regular intervals, rather than simply pushing old values out of the route cache faster.

After some investigation, we found /proc/sys/net/ipv4/route/gc_elasticity which seems to be a better option for keeping the route table size in check:

gc_elasticity can best be described as the average bucket depth the kernel
will accept before it starts expiring route hash entries. This will help
maintain the upper limit of active routes.

We adjusted elasticity from 8 to 4, in the hopes of the route cache pruning itself more aggressively. The secret_interval does not feel correct to us. But there are a bunch of settings and it's unclear which are really the right way to go here.

  • /proc/sys/net/ipv4/route/gc_elasticity (8)
  • /proc/sys/net/ipv4/route/gc_interval (60)
  • /proc/sys/net/ipv4/route/gc_min_interval (0)
  • /proc/sys/net/ipv4/route/gc_timeout (300)
  • /proc/sys/net/ipv4/route/secret_interval (600)
  • /proc/sys/net/ipv4/route/gc_thresh (?)
  • rhash_entries (kernel parameter, default unknown?)

We don't want to make the Linux routing worse, so we're kind of afraid to mess with some of these settings.

Can anyone advise which routing parameters are best to tune, for a high traffic HAProxy instance?

Best Answer

I never ever encountered this issue. However, you should probably increase your hash table width in order to reduce its depth. Using "dmesg", you'll see how many entries you currently have:

$ dmesg | grep '^IP route'
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)

You can change this value with the kernel boot command line parameter rhash_entries. First try it by hand then add it to your lilo.conf or grub.conf.

For example: kernel vmlinux rhash_entries=131072

It is possible that you have a very limited hash table because you have assigned little memory to your HAProxy VM (the route hash size is adjusted depending on total RAM).

Concerning tcp_mem, be careful. Your initial settings make me think you were running with 1 GB of RAM, 1/3 of which could be allocated to TCP sockets. Now you've allocated 367872 * 4096 bytes = 1.5 GB of RAM to TCP sockets. You should be very careful not to run out of memory. A rule of thumb is to allocate 1/3 of the memory to HAProxy and another 1/3 to the TCP stack and the last 1/3 to the rest of the system.

I suspect that your "out of socket memory" message comes from default settings in tcp_rmem and tcp_wmem. By default you have 64 kB allocated on output for each socket and 87 kB on input. This means a total of 300 kB for a proxied connection, just for socket buffers. Add to that 16 or 32 kB for HAProxy, and you see that with 1 GB of RAM you'll only support 3000 connections.

By changing the default settings of tcp_rmem and tcp_wmem (middle param), you can get a lot lower on memory. I get good results with values as low as 4096 for the write buffer, and 7300 or 16060 in tcp_rmem (5 or 11 TCP segments). You can change those settings without restarting, however they will only apply to new connections.

If you prefer not to touch your sysctls too much, the latest HAProxy, 1.4-dev8, allows you to tweak those parameters from the global configuration, and per side (client or server).

I am hoping this helps!