Redhat – LVS DNS load balancing on machine with only 1 NIC

bindkeepalivedload balancinglvsredhat

First of all let me clarify that I am just a software developer not an admin, thus I have some knowledge (let's say a basic understanding of the concepts) regarding networking configuration as well as these types of setup, but I'm no ace, so please bear with me if this sounds stupid or unreasonable.

I'm trying to configure keepalived on RH7 to balance NDS requests between 2 servers where bind has been setup. In the guides I've read so far, they seem to be using 2 NICs, but I only have one available.

References:

HW:

I have 3 machines on the same network configured as follows:

  • 1 machine with 1 NIC acting as the load balancer, real IP 192.168.0.1
  • 1 machine with 1 NIC acting as a main bind server, real IP 192.168.0.2
  • 1 machine with 1 NIC acting as a main bind server, real IP 192.168.0.3

Also forwarding has been enabled net.ipv4.ip_forward = 1

Keepalived configuration:

! This is a comment
! Configuration File for keepalived

global_defs {
   ! this is who emails will go to on alerts
   notification_email {
        admins@example.com
        fakepager@example.com
    ! add a few more email addresses here if you would like
   }
   notification_email_from admins@example.com

   ! I use the local machine to relay mail
   smtp_server 127.0.0.1
   smtp_connect_timeout 30

   ! each load balancer should have a different ID
   ! this will be used in SMTP alerts, so you should make
   ! each router easily identifiable
   lvs_id LVS_EXAMPLE_01
}

! vrrp_sync_groups make sure that several router instances
! stay together on a failure - a good example of this is
! that the external interface on one router fails and the backup server
! takes over, you want the internal interface on the failed server
! to failover as well, otherwise nothing will work.
! you can have as many vrrp_sync_group blocks as you want.
vrrp_sync_group VG1 {
   group {
      VI_1
      VI_GATEWAY
   }
}

! each interface needs at least one vrrp_instance
! each vrrp_instance is a group of VIPs that are logically grouped
! together
! you can have as many vrrp_instaces as you want

vrrp_instance VI_1 {
        state MASTER
        interface eth0

        lvs_sync_daemon_inteface eth0

    ! each virtual router id must be unique per instance name!
        virtual_router_id 51

    ! MASTER and BACKUP state are determined by the priority
    ! even if you specify MASTER as the state, the state will
    ! be voted on by priority (so if your state is MASTER but your
    ! priority is lower than the router with BACKUP, you will lose
    ! the MASTER state)
    ! I make it a habit to set priorities at least 50 points apart
    ! note that a lower number is lesser priority - lower gets less vote
        priority 150

    ! how often should we vote, in seconds?
        advert_int 1

    ! send an alert when this instance changes state from MASTER to BACKUP
        smtp_alert

    ! this authentication is for syncing between failover servers
    ! keepalived supports PASS, which is simple password
    ! authentication
    ! or AH, which is the IPSec authentication header.
    ! I don't use AH
    ! yet as many people have reported problems with it
        authentication {
                auth_type PASS
                auth_pass example
        }

    ! these are the IP addresses that keepalived will setup on this
    ! machine. Later in the config we will specify which real
        ! servers  are behind these IPs
    ! without this block, keepalived will not setup and takedown the
    ! any IP addresses

        virtual_ipaddress {
                192.168.0.10
        ! and more if you want them
        }
}

! now I setup the instance that the real servers will use as a default
! gateway
! most of the config is the same as above, but on a different interface

vrrp_instance VI_GATEWAY {
        state MASTER
        interface eth0
        lvs_sync_daemon_inteface eth0 
        virtual_router_id 52
        priority 150
        advert_int 1
        smtp_alert
        authentication {
                auth_type PASS
                auth_pass example
        }
        virtual_ipaddress {
                192.168.0.11
        }
}

! now we setup more information about are virtual server
! we are just setting up one for now, listening on port 53 for dns
! requests.

! notice we do not setup a virtual_server block for the 192.168.0.10
! address in the VI_GATEWAY instance. That's because we are doing NAT
! on that IP, and nothing else.

virtual_server 192.168.0.10 53 {
    delay_loop 6

    ! use round-robin as a load balancing algorithm
    lb_algo rr

    ! we are doing NAT
    lb_kind NAT
    nat_mask 255.255.255.0

    protocol TCP

    ! there can be as many real_server blocks as you need

    real_server 192.168.0.2 53 {

    ! if we used weighted round-robin or a similar lb algo,
    ! we include the weight of this server

        weight 1

    ! here is a health checker for this server.
    ! we could use a custom script here (see the keepalived docs)
    ! but we will just make sure we can do a vanilla tcp connect()
    ! on port 53
    ! if it fails, we will pull this realserver out of the pool
    ! and send email about the removal
        TCP_CHECK {
            connect_timeout 3
            connect_port 53
        }
    }

    real_server 192.168.0.3 53 {

    ! if we used weighted round-robin or a similar lb algo,
    ! we include the weight of this server

        weight 1

    ! here is a health checker for this server.
    ! we could use a custom script here (see the keepalived docs)
    ! but we will just make sure we can do a vanilla tcp connect()
    ! on port 53
    ! if it fails, we will pull this realserver out of the pool
    ! and send email about the removal
        TCP_CHECK {
            connect_timeout 3
            connect_port 53
        }
    }
}

Conclusion:

Firewall is disabled and connectivity works fine between the machines, keepalived being able to validate a simple TCP connection to the DNS masters. I also am able to execute dig myhost @192.168.0.2/3 from the load balancer and I'm getting the correct results.

However when running dig myhost @192.168.0.10 I get a ;; connection timed out; no servers could be reached. I'd be grateful for any hints or suggestions which would help me overcome this issue, if it's even possible with 1 NIC, and please let me know if additional details are required.

Best Answer

After some more googling around, it occurred to me that perhaps UDP is required as well besides TCP, and that seems to be the case indeed (note to self: probably it would've helped if I had used a tcpdump/tshark...):

Protocol transport

DNS primarily uses the User Datagram Protocol (UDP) on port number 53 to serve requests. DNS queries consist of a single UDP request from the client followed by a single UDP reply from the server. The Transmission Control Protocol (TCP) is used when the response data size exceeds 512 bytes, or for tasks such as zone transfers. Some resolver implementations use TCP for all queries.

The same is also suggested by this older article regarding Load balancing DNS with keepalived written in 2006.

In consequence, I've added the following UDP configuration to what was already present:

virtual_server 192.168.0.10 53 {
    delay_loop 6

    ! use round-robin as a load balancing algorithm
    lb_algo rr

    ! we are doing NAT
    lb_kind NAT
    nat_mask 255.255.255.0

    protocol UDP

    ! there can be as many real_server blocks as you need

    real_server 192.168.0.2 53 {
        ! if we used weighted round-robin or a similar lb algo,
        ! we include the weight of this server
        weight 1
    }

    real_server 192.168.0.3 53 {
        ! if we used weighted round-robin or a similar lb algo,
        ! we include the weight of this server
        weight 1
    }
}

Note: In the LVS mini how-to PDF there was a gotcha:

2.2. Gotchas: you need an outside client (the director and realservers can’t access the virtual service)

Since the PDF seems old as well (2006) it is not the case any more. I'm now able to able to dig from the load balancer itself, however when using a different client machine from the same network I get a ;; reply from unexpected source: 192.168.0.2#53, expected 192.168.0.10#53. I tried the below suggestion from this question, but so far it did not work:

sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.vs.conntrack=1
iptables -t nat -A POSTROUTING -j MASQUERADE

From what I gathered so far, this may have something to do with the network topology and NAT setting but I yet have to figure this one out.

Looks like I still have some surfing to do, but at least I have something to work with and I now know that 1 NIC suffices to load balance the 2 DNS servers (at least for the tests I'm doing now).