Overview

While attempting to run the openshift-installer locally using libvirt, I ran into a peculiar problem with NetworkManager’s packaged version of dnsmasq and systemd-resolved. After a good amount of troubleshooting (most of it spent trying to understand the relationship between all three components), I was able to work it out such that I could get the OpenShift installer running on Ubuntu.

Problem

As suggested I added the configuration settings for NetworkManager to configure dnsmasq and allow my local machine to resolve the hostnames of the VMs. Unfortunately, it wasn’t working. The installer was timing out while trying to connect to the cluster after spinning up:

time="2020-02-06T14:30:08-05:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.test-cluster.tt.testing:6443/version?timeout=32s: dial tcp: lookup api.test-cluster.tt.testing on 127.0.0.53:53: no such host"

Strange, because this works out of the box on RHEL-flavored Linux systems. After some digging and increasing the logging on systemd-resolved, I discovered that queries were going to a strange address:

Feb 06 15:35:26 my-pc systemd-resolved[32657]: Looking up RR for api.test-cluster.tt.testing IN A.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Cache miss for api.test-cluster.tt.testing IN A
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Transaction 30245 for <api.test-cluster.tt.testing IN A> scope dns on enp5s0/*.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Using feature level UDP+EDNS0 for transaction 30245.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Using DNS server 192.168.29.1 for transaction 30245.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Sending query packet with id 30245.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Processing query...
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Processing incoming packet on transaction 30245. (rcode=NXDOMAIN)
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Retrying transaction 30245.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Cache miss for api.test-cluster.tt.testing IN A
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Transaction 30245 for <api.test-cluster.tt.testing IN A> scope dns on enp5s0/*.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Using feature level UDP for transaction 30245.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Sending query packet with id 30245.
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Processing incoming packet on transaction 30245. (rcode=NXDOMAIN)
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Not caching negative entry for: api.test-cluster.tt.testing IN A, cache mode set to no-negative
Feb 06 15:35:26 my-pc systemd-resolved[32657]: Transaction 30245 for <api.test-cluster.tt.testing IN A> on scope dns on enp5s0/* now complete with <rcode-failure> from network (unsigned)

192.168.29.1 is similar to the address I configured according to the OpenShift documentation, but not quite the same.

Turns out that NetworkManager on Ubuntu automatically creates extra dnsmasq settings, further confusing things:

Feb 06 12:12:10 my-pc NetworkManager[13250]: <info>  [1581009130.5539] dnsmasq[0x5604c3209e20]: dnsmasq appeared as :1.200
Feb 06 12:12:10 my-pc dnsmasq[13325]: setting upstream servers from DBus
Feb 06 12:12:10 my-pc dnsmasq[13325]: using nameserver 192.168.126.1#53 for domain tt.testing
Feb 06 12:12:10 my-pc dnsmasq[13325]: using nameserver 192.168.29.1#53(via enp5s0)
Feb 06 12:12:10 my-pc dnsmasq[13325]: using nameserver 192.168.29.1#53 for domain 29.168.192.in-addr.arpa
Feb 06 12:12:10 my-pc dnsmasq[13325]: using nameserver 192.168.29.1#53 for domain 254.169.in-addr.arpa
Feb 06 12:12:10 my-pc dhclient[13322]: DHCPREQUEST of 192.168.29.186 on enp5s0 to 255.255.255.255 port 67 (xid=.......)
Feb 06 12:12:10 my-pc dhclient[13322]: DHCPACK of 192.168.29.186 from 192.168.29.1

Solution

I tried several solutions1 2, but in the end what worked for me was to change the default DNS server for my connection to 127.0.1.1 using systemd-resolve.

First, retrieve your connection that’s causing problems:

$ systemd-resolve --status
...
Link 2 (enp5s0)
      Current Scopes: DNS
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.29.1
          DNS Domain: ~.
...

Now update your dnsmasq configuration to add that address in:

$ vim /etc/NetworkManager/dnsmasq.d/openshift.conf

server=/tt.testing/192.168.126.1
server=192.168.29.1

And now update your systemd-resolved configuration to use dnsmasq for resolution:

$ sudo systemd-resolve --set-dns=127.0.1.1 --interface=enp5s0

And that should do the trick:

$ nslookup api.test-cluster.tt.testing
Server:		127.0.0.53
Address:	127.0.0.53#53

Non-authoritative answer:
Name:	api.test-cluster.tt.testing
Address: 192.168.126.10
Name:	api.test-cluster.tt.testing
Address: 192.168.126.11

Unfortunately, those changes won’t survive a reboot. In order to make the changes permanent, you’ll need to create a file in /etc/systemd/network/, such as /etc/systemd/network/enp5s0.conf with contents similar to:

[Match]
Name=enp5s0

[Resolve]
DNS=127.0.1.1

Now with that set, your changes should survive a reboot.