Openvpn – Very low TCP OpenVPN throughput (100Mbit port, low CPU utilization)

openvpnvpn

I am experiencing extremely slow OpenVPN transfer rates between two servers. For this question, I'll call the servers Server A and Server B.

Both Server A and Server B are running CentOS 6.6. Both are located in datacenters with a 100Mbit line and data transfers between the two servers outside of OpenVPN run close to ~88Mbps.

However, when I attempt to transfer any files over the OpenVPN connection I've established between Server A and Server B, I get throughput right around 6.5Mbps.

Test results from iperf:

[  4] local 10.0.0.1 port 5001 connected with 10.0.0.2 port 49184
[  4]  0.0-10.0 sec  7.38 MBytes  6.19 Mbits/sec
[  4]  0.0-10.5 sec  7.75 MBytes  6.21 Mbits/sec
[  5] local 10.0.0.1 port 5001 connected with 10.0.0.2 port 49185
[  5]  0.0-10.0 sec  7.40 MBytes  6.21 Mbits/sec
[  5]  0.0-10.4 sec  7.75 MBytes  6.26 Mbits/sec

Aside from these OpenVPN iperf tests, both servers are virtually completely idle with zero load.

Server A is assigned the IP 10.0.0.1 and it is the OpenVPN server. Server B is assigned the IP 10.0.0.2 and it is the OpenVPN client.

The OpenVPN configuration for Server A is as follows:

port 1194
proto tcp-server
dev tun0
ifconfig 10.0.0.1 10.0.0.2
secret static.key
comp-lzo
verb 3

The OpenVPN configuration for Server B is as follows:

port 1194
proto tcp-client
dev tun0
remote 204.11.60.69
ifconfig 10.0.0.2 10.0.0.1
secret static.key
comp-lzo
verb 3

What I've noticed:

1. My first thought was that I was bottlenecking the CPU on the server. OpenVPN is single-threaded and both of these servers run Intel Xeon L5520 processors which aren't the fastest. However, I ran a top command during one of the iperf tests and pressed 1 to view CPU utilization by core and found that the CPU load was very low on each core:

top - 14:32:51 up 13:56,  2 users,  load average: 0.22, 0.08, 0.06
Tasks: 257 total,   1 running, 256 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.4%us,  1.4%sy,  0.0%ni, 94.8%id,  0.3%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.3%st
Cpu3  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    946768k total,   633640k used,   313128k free,    68168k buffers
Swap:  4192188k total,        0k used,  4192188k free,   361572k cached

2. Ping times increase considerably over the OpenVPN tunnel while iperf is running. When iperf is not running, ping times over the tunnel are consistently 60ms (normal). But when iperf is running and pushing heavy traffic, ping times become erratic. You can see below how the ping times are stable until the 4th ping when I've started the iperf test:

PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=60.1 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=60.1 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=60.2 ms
** iperf test begins **
64 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=146 ms
64 bytes from 10.0.0.2: icmp_seq=5 ttl=64 time=114 ms
64 bytes from 10.0.0.2: icmp_seq=6 ttl=64 time=85.6 ms
64 bytes from 10.0.0.2: icmp_seq=7 ttl=64 time=176 ms
64 bytes from 10.0.0.2: icmp_seq=8 ttl=64 time=204 ms
64 bytes from 10.0.0.2: icmp_seq=9 ttl=64 time=231 ms
64 bytes from 10.0.0.2: icmp_seq=10 ttl=64 time=197 ms
64 bytes from 10.0.0.2: icmp_seq=11 ttl=64 time=233 ms
64 bytes from 10.0.0.2: icmp_seq=12 ttl=64 time=152 ms
64 bytes from 10.0.0.2: icmp_seq=13 ttl=64 time=216 ms

3. As mentioned above, I ran iperf outside of the OpenVPN tunnel and the throughput was normal — ~88Mbps consistently.

What I've tried:

1. I thought compression might be fouling things up, so I turned off compression by removing comp-lzo from both configs and restarting OpenVPN. No improvement.

2. Even though I previously found that the CPU utilization was low, I thought the default cipher might be a little too intensive for the system to keep up with. So I added cipher RC2-40-CBC to both configs (a very lightweight cipher) and restarted OpenVPN. No improvement.

3. I read on various forum about how tweaking the fragment, mssfix and mtu-tun might help with performance. I played with a few variations as described in this article, but again, no improvement.

Any ideas on what could be causing such poor OpenVPN performance?

Best Answer

After a lot of Googling and configuration file tweaks, I found the solution. I'm now getting sustained speeds of 60Mbps and burst up to 80Mbps. It's a bit slower than the transfer rates I receive outside the VPN, but I think this is as good as it'll get.

The first step was to set sndbuf 0 and rcvbuf 0 in the OpenVPN configuration for both the server and the client.

I made that change after seeing a suggestion to do so on a public forum post (which is an English translation of a Russian original post) that I'll quote here:

It's July, 2004. Usual home internet speed in developed countries is 256-1024 Kbit/s, in less developed countries is 56 Kbit/s. Linux 2.6.7 has been released not a long ago and 2.6.8 where TCP Windows Size Scaling would be enabled by default is released only in a month. OpenVPN is in active development for 3 years already, 2.0 version is almost released. One of the developers decides to add some code for socket buffer, I think to unify buffer sizes between OSes. In Windows, something goes wrong with adapters' MTU if custom buffers sizes are set, so finally it transformed to the following code:

#ifndef WIN32
o->rcvbuf = 65536;
o->sndbuf = 65536;
#endif

If you used OpenVPN, you should know that it can work over TCP and UDP. If you set custom TCP socket buffer value as low as 64 KB, TCP Window Size Scaling algorithm can't adjust Window Size to more than 64 KB. What does that mean? That means that if you're connecting to other VPN site over long fat link, i.e. USA to Russia with ping about 100 ms, you can't get speed more than 5.12 Mbit/s with default OpenVPN buffer settings. You need at least 640 KB buffer to get 50 Mbit/s over that link. UDP would work faster because it doesn't have window size but also won't work very fast.

As you already may guess, the latest OpenVPN release still uses 64 KB socket buffer size. How should we fix this issue? The best way is to disallow OpenVPN to set custom buffer sizes. You should add the following code in both server and client config files:

sndbuf 0
rcvbuf 0

The author goes on to describe how to push buffer size adjustments to the client if you are not in control of the client config yourself.

After I made those changes, my throughput rate bumped up to 20Mbps. I then saw that CPU utilization was a little high on a single core so I removed comp-lzo (compression) from the configuration on both the client and server. Eureka! Transfer speeds jumped up to 60Mbps sustained and 80Mbps burst.

I hope this helps someone else resolve their own issues with OpenVPN slowness!

Related Topic