AWS EC2 – Fixing Missing eth0 After Startup from Snapshot AMI

amazon ec2amazon-web-servicesethernetnetworking

I recently created a new EC2 from a snapshot of our Production EC2.

The machine started up fine, and I can ssh in, however – cannot access via anything else. No WWW, nothing.

Upon further inspection of the device, primarily the network stack – I see this:

/etc/udev/rules.d/70-persistent-net.rules

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="06:68:f3:22:91:f2", NAME="ens5"

ifconfig

ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.31.12.146  netmask 255.255.240.0  broadcast 172.31.15.255
        inet6 fe80::468:f3ff:fe22:91f2  prefixlen 64  scopeid 0x20<link>
        ether 06:68:f3:22:91:f2  txqueuelen 1000  (Ethernet)
        RX packets 492  bytes 81928 (80.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 474  bytes 76982 (75.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 6  bytes 416 (416.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6  bytes 416 (416.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Note the ens5 at the first line of the ifconfig.

[/etc] # service network restart

Restarting network (via systemctl):  Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
                                                       [FAILED]

[/etc] # systemctl status network.service

● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2018-10-16 11:13:34 EDT; 1min 4s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2223 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)
   CGroup: /system.slice/network.service
           └─857 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--ens5.lease -pf /var/run/dhclient-ens5.pid -H ip-172-31-12-146 ens5

Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal network[2223]: RTNETLINK answers: File exists
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal network[2223]: RTNETLINK answers: File exists
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal network[2223]: RTNETLINK answers: File exists
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal network[2223]: RTNETLINK answers: File exists
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal network[2223]: RTNETLINK answers: File exists
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal network[2223]: RTNETLINK answers: File exists
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal systemd[1]: network.service: control process exited, code=exited status=1
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal systemd[1]: Failed to start LSB: Bring up/down networking.
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal systemd[1]: Unit network.service entered failed state.
Oct 16 11:13:34 ip-172-31-12-146.us-west-1.compute.internal systemd[1]: network.service failed.

It cannot find eth0, nor can it restart the network stack. I have tried rebooting the machine, shutting down and starting up, with no luck. What am I missing?

Best Answer

Did you change from older instance type to T3 / M5 / C5? These have got a different hardware and use different device names.

One option is to reconfigure the network stack to reflect the new device names - that may be quite an undertaking unless you’re a skilled Linux admin and know what you are doing.

Or, easier, change the instance type to the same that you made the snapshot from. That should fix the device names back to what they used to be.

You can change the size, e.g. from large to medium, but keep the type - if it was T2 use T2 again.

Also I suggest you restore it from the snapshot to a fresh instance - the current one has probably tried to accommodate the new device names and may be in inconsistent state. Better to start again from the Prod snapshot.

Hope that helps :)