I had this weird problem while testing the brand new Ubuntu 14.04 Trusty
in my Openstack environment.
Same image, same flavors, same specifications - sometimes even the same nova-compute host. I spawn 20 VMs and maybe 50% of the VMs spawned wouldn’t ping.
And now, it’s not like I tinkered a lot with the image at all. It’s provided directly from Ubuntu from http://cloud-images.ubuntu.com/trusty/.
I started investigating…
No relevant information provided by logs of nova-api, neutron, nova-compute.
Looking at console logs for the affected VMs, I would find the following:
cloud-init-nonet[4.54]: waiting 10 seconds for network device
cloud-init-nonet[14.57]: waiting 120 seconds for network device
cloud-init-nonet[134.57]: gave up waiting for a network device
Apparently, I’m not the only one with this problem either: https://ask.openstack.org/en/question/28297/cloud-init-nonet-waiting-and-fails/
I found two ways to make a VM with this problem ping:
- Soft reboot it
- Do an “ifdown eth0 && ifup eth0”
Digging deeper and trying to find the issue, I realized that in Ubuntu
14.04, you now have a /etc/network/interfaces.d folder.
The image provided by cloud-images has a /etc/network/interfaces.d/eth0 file there by default configured to DHCP.
When cloud-init boots over an image in my environment, it’ll configure
the eth0 interface in the /etc/network/interfaces file with the IP
provided by Neutron - resulting in, you guessed it, two eth0 interfaces.
Oh, also, I don’t use DHCP at all.
Now, add some race conditions in there and you have a failure rate of 50-60% !
I confirmed my theory by removing the eth0 file from /etc/network/interfaces.d in the image I use, booted another round of VMs with this new image and they were all reachable.
I filed a bug with cloud-init so it can perhaps check if the interface it wants to configure is already in /etc/network/interfaces.d folder. Let’s see what happens.