Why is LVS dropping packets

centos5lvsxen

I am currently trying to get down to the core of a problem where my
LVS-director seems to drop a packet coming from a client from time to
time. We have this problem on our production systems and can reproduce
the problem on staging.

I posted this problem on the lvs-users-mailing-list and got no response so far.

Our setup:

We are using ipvsadm with Linux CentOS5 x86_64 in a PV XEN-DomU.

Current Version details:

Kernel: 2.6.18-348.1.1.el5xen
ipvsadm: 1.24-13.el5

LVS-Setup:

We use IPVS in DR-mode, for managing the running connections we use
lvs-kiss.

ipvsadm is running in a heartbeat-v1-cluster (two virtual nodes), master
and backup are running constantly on both nodes.

For the LVS-services we use logical IPs being setup by heartbeat
(active/passive-clustermode)

The real-servers are physical Linux-machines.

Network-Setup:

The VM acting as director is running as XEN-PV-DomU on a Dom0 using bridged networks.

Networks "in play":

abn-network (staging-network, used to connect the client to the director),
used by the real-servers to send the answer to the clients (direct routing approach),
used for ipvsadm slave/master multicast-traffic
lvs-network: This is a dedicated VLAN which connects director and real-servers
DR-arp-problem: solved my suppressing arp-answers on the real-servers for the service-ip
The service-IP is configured as logical IP on the lvs-interface on the real-servers.
In this setup ip_forwarding is not needed anywhere (neither on
director, nor on real-server).

VM details:

1 GB RAM, 2 vCPUs, system-load almost 0, memory 73M free, 224M buffers, 536M cache, no swap.

top shows almost always 100% idle, 0% us/sy/ni/wa/hi/si/st.

Configuration details:

ipvsadm -Ln for the service in question shows:

TCP  x.y.183.217:12405 wrr persistent 7200
 -> 192.168.83.234:12405   Route   1000   0          0
 -> 192.168.83.235:12405   Route   1000   0          0

x.y first two octets are from our internal class-B-range.
We use 192.168.83.x as lvs-network for staging.

Persistent ipvsadm-configuration:
/etc/sysconfig/ipvsadm: –set 20 20 20

Cluster-configuration:
/etc/ha.d/haresources: $primary_directorname lvs-kiss x.y.183.217

lvs-kiss-configuration-snippet for the service above:

<VirtualServer idm-abn:12405>
  ServiceType       tcp
  Scheduler         wrr
  DynamicScheduler    0
  Persistance         7200
  QueueSize           2
  Fuzz              0.1
  <RealServer rs1-lvs:12405>
    PacketForwardingMethod  gatewaying
    Test ping -c 1 -nq -W 1 rs1-lvs >/dev/null
    RunOnFailure   "/sbin/ipvsadm -d -t idm-abn:12405 -r rs1-lvs"
    RunOnRecovery   "/sbin/ipvsadm -a -t idm-abn:12405 -r rs1-lvs"
  </RealServer>
  <RealServer rs2-lvs:12405>
    PacketForwardingMethod  gatewaying
    Test ping -c 1 -nq -W 1 rs2-lvs >/dev/null
    RunOnFailure   "/sbin/ipvsadm -d -t idm-abn:12405 -r rs2-lvs"
    RunOnRecovery   "/sbin/ipvsadm -a -t idm-abn:12405 -r rs2-lvs"
  </RealServer>
</VirtualServer>

idm-abn, rs1 and rs2 resolve via /etc/hosts.

About the service:

This is a soa-web-service.

How we reproduce the error:

From a client we run constant calls to the web-service at an interval of one call in three seconds.
From time to time there will be a connection reset from the director to the client.

Interesting: This happens on n x 100th + 1 tries – interesting is the one.

What we did to trace down the problem:

Checked /proc/sys/net/ipv4/vs: all values are set to default, so drop_packet is NOT in place (=0)
tcpdump on client, fronted/abn of the director, backend/lvs of the directory, lvs and abn of the real-servers

In this tcpdump we could see a request from the client, answered by a
connection-reset by the director.
The packet was NOT forwarded via LVS.

I welcome any ideas on how to track this problem further down.
If any information is unclear/missing to drill down the problem – please
ask.

Best Answer

Do you have any stateful iptables rules on the LVS-DR director? As I can see you are using port 12405, so if you have a rule like this:

iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 12405 -j ACCEPT

In LVS-DR real servers are replying to requests from clients (and not the director), the director won't add those connections in the connection tracking table and the FIN packets won't be detected on the director's iptables with the rules ESTABLISHED,RELATED. Since you only allow NEW (SYN) packets on port 12405, FIN will be blocked. You have to use a stateless firewall on an LVS-DR director for load balanced services:

iptables -A INPUT -m tcp -p tcp --dport 12405 -j ACCEPT

Related Solutions

Nat – LVS – source IP

You might reconsider your Windows configuration. I have used direct routing with LVS successfully in Windows. As per the documentation a member of my team wrote:

First install the Windows Loopback Adapter. Start > hdwwiz.exe

Click Next then "Install the hardware that I manually select from a
list (Advanced)
Scroll Down and click "Network Adapters"

Choose Microsoft, then Microsoft Loopback Adapter

Finish the Wizzard

Go to Control Panel\Network and Internet\Network Connections. Rename
the adapters to their descriptive names. Right click on the loopback adapter and manually assign it the LVS VIP.
Go to Start > cmd.exe (right click and choose run as administrator)

Run these Commands.


netsh interface ipv4 set interface "Name of Adapter that holds the real
host IP" weakhostreceive=enabled netsh interface ipv4 set interface "loopback" weakhostreceive=enabled netsh interface ipv4 set interface "loopback" weakhostsend=enabled

This was a Windows 2008 server, which was configured initially using this Web site for guidance.

As far as logging goes, often the only solution will be to utilize the logging at the point in which the client's real IP is still in the route.

With Web traffic, the X_FORWARDED_FOR environment variable could be used. Point being, after a certain point, the network layer cannot be relied on for this information. In that case, you have to move further up the stack for potential solutions.

How to find the real IP to which IPVS is routing a virtual IP

This is quite a typical problem with IPVS/ldirectord, you have several options to try to resolve this

Check logging on the destination servers looking for the request (not very accurate but most of the time it'll cover the bill)
If your destination server replies with headers (like apache or ftp) just add a new custom header like X-Served-by or some hint in the welcome header
Activate logging on ldirectord, the logging is a bit obscure but I'm pretty sure it'll be useful, just add to your ldirectord.cf logfile="/var/log/ldirectord.log" in the global options