One of our severs is behaving strangely with networking on startup. We've been using configuration changes and ifup/ifdown to try to isolate the issue, with frustratingly inconsistent results. We have a configuration that is consistent with other servers where we are not having this issue. A significant difference is that this server is from a different hardware vendor, but I don't know why that would be causing the problem we're having.
When we bring up the eth0 interface with sudo ifup eth0
, the bonded interface bond0
, and the vlan interface bond0.3001
, are brought up, and appear correctly configured from ifconfig, but the default route is missing.
user@admin1:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0.3001
Repeated attempts to bring the interface down (using sudo ifdown bond0
, which seems to be the easiest way to bring down the whole multi-layered config with a single command) and then back up, don't seem to ever add the route correctly. However, if we edit the bond0.3001
entry in the interfaces file /etc/network/interfaces
, it will frequently (but not always, argh) bring up the interfaces, and the correct default route.
user@admin1:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.1.1.1 0.0.0.0 UG 0 0 0 bond0.3001
10.1.1.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0.3001
We have been commenting in/out the "network" line below as the file edit, with somewhat consistent results; when we make the edit, and bounce the interface stack, the default route is present pretty frequently. When we bounce the interface stack without making the edit, it seems to fail 100% of the time. Adding the default route manually with sudo route add default gw 10.1.1.1 bond0.3001
has worked every time we've tried, but we need the system to restart networking reliably.
Here is our config:
user@admin1:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
bond-master bond0
auto eth1
iface eth1 inet manual
bond-master bond0
# The primary network interface
#Aggregate bond
auto bond0
iface bond0 inet manual
bond-slaves none
bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate 1
auto bond0.3001
iface bond0.3001 inet static
address 10.1.1.14
gateway 10.1.1.1
netmask 255.255.255.0
#network 10.1.1.0
broadcast 10.1.1.255
#post-up route add default gw 10.1.1.1 bond0.3001
#post-down route del default gw 10.1.1.1 bond0.3001
One additional datapoint – in running sudo strace -f sudo ifup eth0
to try to see what the differences are, it takes longer on a successful attempt (default route created) than a failed attempt. (default route missing) This may just be other downstream networks services running, that fail quickly when there's no default route.
EDIT: getting suspicious that kernel updates are the culprit, update that seemed to resolve the issue for this server was 3.19.0-25 to 3.19.0-43. We've seen similar behavior on other servers after kernel updates. More testing on specific kernel versions would probably be necessary to track down the root cause.
Best Answer
I just hit this issue with a rack of Ubuntu 14.04.3 servers this week, all upgraded to linux-generic-lts-vivid and running the 3.19.0-43 kernel. After flailing for a few days, we solved the problem by downgrading the kernel to 3.16.0-57 (we had the same problem with the 3.19.0-2x kernel that ships with the 14.04.3 amd64 server iso).
As far as symptoms, default routes were non-deterministically added during boot. We could always fix it via
ifdown bond0 && ifup p4p2 p5p2
, but, like you, we needed networking to work reliably on boot.