Ifup default route missing

ifconfiglinux-networkingubuntu-14.04

One of our severs is behaving strangely with networking on startup. We've been using configuration changes and ifup/ifdown to try to isolate the issue, with frustratingly inconsistent results. We have a configuration that is consistent with other servers where we are not having this issue. A significant difference is that this server is from a different hardware vendor, but I don't know why that would be causing the problem we're having.

When we bring up the eth0 interface with sudo ifup eth0, the bonded interface bond0, and the vlan interface bond0.3001, are brought up, and appear correctly configured from ifconfig, but the default route is missing.

user@admin1:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 bond0.3001

Repeated attempts to bring the interface down (using sudo ifdown bond0, which seems to be the easiest way to bring down the whole multi-layered config with a single command) and then back up, don't seem to ever add the route correctly. However, if we edit the bond0.3001 entry in the interfaces file /etc/network/interfaces, it will frequently (but not always, argh) bring up the interfaces, and the correct default route.

user@admin1:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.1.1.1        0.0.0.0         UG    0      0        0 bond0.3001
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 bond0.3001

We have been commenting in/out the "network" line below as the file edit, with somewhat consistent results; when we make the edit, and bounce the interface stack, the default route is present pretty frequently. When we bounce the interface stack without making the edit, it seems to fail 100% of the time. Adding the default route manually with sudo route add default gw 10.1.1.1 bond0.3001 has worked every time we've tried, but we need the system to restart networking reliably.

Here is our config:

user@admin1:~$ cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
    bond-master bond0
auto eth1
iface eth1 inet manual
    bond-master bond0

# The primary network interface
#Aggregate bond
auto bond0
iface bond0 inet manual
    bond-slaves none
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate 1

auto bond0.3001
iface bond0.3001 inet static
    address 10.1.1.14
    gateway 10.1.1.1
    netmask 255.255.255.0
    #network 10.1.1.0
    broadcast 10.1.1.255
    #post-up route add default gw 10.1.1.1 bond0.3001
    #post-down route del default gw 10.1.1.1 bond0.3001

One additional datapoint – in running sudo strace -f sudo ifup eth0 to try to see what the differences are, it takes longer on a successful attempt (default route created) than a failed attempt. (default route missing) This may just be other downstream networks services running, that fail quickly when there's no default route.

EDIT: getting suspicious that kernel updates are the culprit, update that seemed to resolve the issue for this server was 3.19.0-25 to 3.19.0-43. We've seen similar behavior on other servers after kernel updates. More testing on specific kernel versions would probably be necessary to track down the root cause.

Best Answer

I just hit this issue with a rack of Ubuntu 14.04.3 servers this week, all upgraded to linux-generic-lts-vivid and running the 3.19.0-43 kernel. After flailing for a few days, we solved the problem by downgrading the kernel to 3.16.0-57 (we had the same problem with the 3.19.0-2x kernel that ships with the 14.04.3 amd64 server iso).

apt-get remove linux-generic-lts-vivid -y
apt-get autoremove
apt-get install linux-generic-lts-utopic -y
rm -f /boot/*3.19*
update-grub
reboot

As far as symptoms, default routes were non-deterministically added during boot. We could always fix it via ifdown bond0 && ifup p4p2 p5p2, but, like you, we needed networking to work reliably on boot.