loading...

Default route not set on Ubuntu VM in Azure (via DHCP)

amoghe profile image Akshay Moghe ・2 min read

Misadventures in bringing up the same Ubuntu image on different hypervisors

I recently stumbled upon a couple of rather nasty issues when bringing up a server on a variety of hypervisors (and cloud IaaS). Now, admittedly, I was doing something "not normal" - i.e - trying to bring up a VM that was a replica of one that had worked on VMWare ESX (using a dd copy of the disk) on other platforms. What struck me was that the VM would:

  1. boot up normally on VMWare (obviously), networking configured via DHCP
  2. boot up normally on AWS, networking configured, instance was SSH'able
  3. boot up normally on Azure, but would not get any default routes, so no SSH!

After some poking around, I learnt a couple of lessons

1. dhclient has multiple ways in which it will set a default route

These are based off the response from the DHCP server. Now,

  1. some DHCP servers return Classless Static Routes option
  2. some DHCP servers return Router option

After a lot of debugging (which is excruciatingly painful, since the only real "debug tool" available for these issues is using the debug script in /etc/dhcp/dhclient-exit-hooks) I figured out that one environment worked fine because it was using the Router option whereas on Azure the DHCP server sends back the Classless Static Routes option.

And dhclient prefers the Classless Static Routes option (see here

On VMWare, and on AWS the DHCP server actually uses the Router option, which is why it worked fine across those 2 environments.

Okay, so that solves one mystery - but given that dhclient should know how to handle both options - why wasn't it working for my VM on Azure?

2. dhclient-exit-hook scripts are sourced

On further investigation, it turned out that one of my dhclient-exit-hook scripts was doing the "wrong thing" and terminating early due to a misplaced exit 0.

It is not clearly mentioned anywhere (neither man-pages, nor on the interwebz) that the dhclient-exit-hook scripts are actually sourced rather than executed.

It turned out that one of my custom scripts was "exiting" the execution, thus causing some of the remaining scripts to not run. Most notably, there is a default script on Ubuntu that deals with the Classless Static Routes option (called rfc3442-classless-routes) that actually handles this correctly. However since the scripts are sourced in lexicographical order, this script would never run because of my aforementioned problem.

One last thing of note - this post on StackExchange was immensely helpful in me finding the root cause of my problems.

Discussion

pic
Editor guide