David Moreau Simard

2 minute read

I had this weird problem while testing the brand new Ubuntu 14.04 Trusty in my Openstack environment.
Same image, same flavors, same specifications - sometimes even the same nova-compute host. I spawn 20 VMs and maybe 50% of the VMs spawned wouldn’t ping.

And now, it’s not like I tinkered a lot with the image at all. It’s provided directly from Ubuntu from http://cloud-images.ubuntu.com/trusty/.

I started investigating…
No relevant information provided by logs of nova-api, neutron, nova-compute.
Seemingly random.

Looking at console logs for the affected VMs, I would find the following:

cloud-init-nonet[4.54]: waiting 10 seconds for network device
cloud-init-nonet[14.57]: waiting 120 seconds for network device
cloud-init-nonet[134.57]: gave up waiting for a network device

Apparently, I’m not the only one with this problem either: https://ask.openstack.org/en/question/28297/cloud-init-nonet-waiting-and-fails/

I found two ways to make a VM with this problem ping:

  • Soft reboot it
  • Do an “ifdown eth0 && ifup eth0”

Digging deeper and trying to find the issue, I realized that in Ubuntu 14.04, you now have a /etc/network/interfaces.d folder.
The image provided by cloud-images has a /etc/network/interfaces.d/eth0 file there by default configured to DHCP.

When cloud-init boots over an image in my environment, it’ll configure the eth0 interface in the /etc/network/interfaces file with the IP provided by Neutron - resulting in, you guessed it, two eth0 interfaces. Oh, also, I don’t use DHCP at all.
Now, add some race conditions in there and you have a failure rate of 50-60% !

I confirmed my theory by removing the eth0 file from /etc/network/interfaces.d in the image I use, booted another round of VMs with this new image and they were all reachable.

I filed a bug with cloud-init so it can perhaps check if the interface it wants to configure is already in /etc/network/interfaces.d folder. Let’s see what happens.