I think its fair to say that we’ve all probably accidentally relied on DNS ordering at some point in our lives. You’re setting up a small homelab and rather than set up caching you just stick your internal DNS server as the first in the list – and it works! Local and external names get resolved just as you would expect. It doesn’t even feel wrong because it’s so natural to assume that servers are queried in the order they’re listed.

But alas in the process we’ve broken one of the cardinal rules of DNS, that all the servers in our server list (/etc/resolv.conf) are supposed to serve the same content. DNS resolvers are not only not required to obey a particular order but are encouraged not to do so. It’s spelled out in RFC 1035.

The resolver always starts with a list of server names to query (SLIST). This list will be all NS RRs which correspond to the nearest ancestor zone that the resolver knows about. To avoid startup problems, the resolver should have a set of default servers which it will ask should it have no current NS RRs which are appropriate. The resolver then adds to SLIST all of the known addresses for the name servers, and may start parallel requests to acquire the addresses of the servers when the resolver has the name, but no addresses, for the name servers.

To complete initialization of SLIST, the resolver attaches whatever history information it has to the each address in SLIST. This will usually consist of some sort of weighted averages for the response time of the address, and the batting average of the address (i.e., how often the address responded at all to the request). Note that this information should be kept on a per address basis, rather than on a per name server basis, because the response time and batting average of a particular server may vary considerably from address to address. Note also that this information is actually specific to a resolver address / server address pair, so a resolver with multiple addresses may wish to keep separate histories for each of its addresses. Part of this step must deal with addresses which have no such history; in this case an expected round trip time of 5-10 seconds should be the worst case, with lower estimates for the same local network, etc.

Even though a number of resolver implementations just go though the list until they get a result it’s important that they are given the freedom to query any server with confidence.

With this knowledge a resolver can attempt to be smarter and, for example:

  • Query all servers servers and take the first response.
  • Remember which servers were the fastest and prefer them.
  • Determine which server is closest and use that.
  • Query the server that’s under the least load.
  • Note that servers are down and stop querying them for a while.

The popular caching server dnsmasq takes advantage of this behavior. From dnsmasq(8).

By default, dnsmasq will send queries to any of the upstream servers it knows about and tries to favour servers that are known to be up. Setting this flag forces dnsmasq to try each query with each server strictly in the order they appear in /etc/resolv.conf

How should it be done then?

First we need to state what the actual goal is. In networking vernacular we’re trying to merge two DNS zones.

*.localdomain -> 192.168.1.2
*.*           -> 8.8.8.8

It just lends itself so naturally do fallthough doesn’t it? It’s really no wonder it’s so common to do it this way.

Enter dnsmasq, my favorite homelab swiss army knife. In this case we’re going to use it through NetworkManager as our local resolver.

You can most likely install dnsmasq from your distributions repositories. On those pesky systems that automatically enable services when they’re installed you should disable them because we’re going to have NetworkManager handle this for us.

systemctl disable dnsmasq
chkconfig dnsmasq off

Then create the file /etc/NetworkManager/conf.d/NM-dnsmasq.conf with the following content.

[main]
dns=dnsmasq

Then create the /etc/NetworkManager/dnsmasq.d/dnsmasq.conf as our main config file. Because we’re having NetworkManager handle all the heavy lifting we don’t have to add our ‘main’ DNS servers and they’ll correctly update when using DHCP. All we need to do is let dnsmasq know about *.localdomain and where to find records for it.

Add the following to redirect queries to the appropraite DNS server.

server=/localdomain/192.168.1.2

Restart NetworkManager and you’re all set.