Linux – How to fix a bad arp entry

arpclusterlinuxswitch

I'm just guessing that arp is my problem…

I have a linux drbd server cluster set up, and due to some power issues had to unplug the switch that connects the two servers. As a result, both servers became primary and took the same IP address for several seconds. (this caused a split-brain condition , but that's another issue)

My problem is that now some servers seem to be able to see the shared IP address of the cluster, and some cannot. I am wondering if this could be a situation where some switches/ports are sending the traffic to one server, and some to the other?

And if this IS the problem, how can I resolve it?

and… is this done at the switch, or on the server?

Best Answer

If it's really an arp issue, the problem will be confined to the network device doing the routing (since that what ARP is for - mapping L3 addresses (IP) to L2 addresses (MAC)) or possibly in the ARP cache of a server sitting in the same IP subnet. It won't involve a switch unless it's an L3 switch.

To address the problem on a cisco router, you can run the following command to clear the arp cache and allow it to rebuild:

clear arp

To remove the bad arp entry from a server which may be caching bad information (so, not the server that can't be reached, but the server that can't do the reaching) you can manually delete the bogus entry out of the ARP cache, where IP address is the IP of the server which can't be reached. Note this same syntax appears to be valid on both Linux and Windows:

arp -d <ip-address>

You can also send a gratuitous ARP from the server which can't be reached to cause other hosts on the same IP subnet to update their ARP caches (I have this in my notes, but I admit I haven't used it in a long time. I can't remember if this allows you to skip the steps above, or just shortens the process of the other hosts adding an arp entry after running the commands above):

arping -q -A -c 1 -I eth0 <ip-address>
arping -q -U -c 1 -I eth0 <ip-address>

All of the above is for an ARP issue, but you specifically mention a switch in your question. If it's a switch that only uses L3 for management, then the data flow problems would have to be problems with the MAC cache, not the ARP cache. In that case, you could run the following on the switch to purge the dynamic cache contents:

clear mac-address-table dynamic

Related Solutions

Bad ARP Cache static entries

Try to identify if the MAC address belongs to any equipment on your network. You can lookup vendor addresses here.

Do you have a wireless access point on your network, is it secured? look for the MAC addresses of any connected devices. You can normally view these from the access point.

Do you have any virtual machines running on any of the workstations or servers?

Is it always the same MAC address that appears in the arp table?

What about the arp caches on your network switch and router ( if they let you view them) do the IP/MACs correspond with the table on your server at the time you have an issue? Does powercycling the switch/ router make any difference?

Linux Bash – How to Sort du -h Output by Size

As of GNU coreutils 7.5 released in August 2009, sort allows a -h parameter, which allows numeric suffixes of the kind produced by du -h:

du -hs * | sort -h

If you are using a sort that does not support -h, you can install GNU Coreutils. E.g. on an older Mac OS X:

brew install coreutils
du -hs * | gsort -h

From sort manual:

-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)

Best Answer

Related Solutions

Bad ARP Cache static entries

Linux Bash – How to Sort du -h Output by Size

Related Topic