OpenVPN switch back to primary server after connection recovery on pfSense

failoveropenvpnpfsensezabbix

We are using pfSense as Openvpn server / clients.
We are have around 20 site to site pfSense clients which connects to main site using Openvpn site to site configuration.
On main pfSense (vpn server) we did install multiple wans and we have setup failover site to site connection, by setting server to listen localhost and port forwards vpn ports from multiple wans to 127.0.0.1.
On the clients in Custom options we added:

remote serverAlternatieWanIp VpnPort udp

This scenario working OK for us when main link goes down clients established new connection over alternative WAN.

What we do not know is:
How to push clients to switch back (reconnect) to main WAN after main WAN connection get up again ? (Our workaround is now to restart Openvpn client or sometimes to restart whole pfSense to push vpn clients to connect again to main WAN, or to "kill" alternative WAN to push VPN clients to reconnect to main. All of them seems for me is bad way to do that).

We will also love to see which of clients is connected to alternate WAN.
Now workaround is to go on each vpn client pfsense and to read remote host address on vpn status of client.
We are using Zabbix to monitor our network infrastructure, and we will love to try figure which wan is used for connection in some API way so we can at least trigger error on Zabbix and tell admins to reconnect client to main WAN.

Best Answer

I have two ideas

1. You could have OpenVPN wait until there is no activity (if possible, maybe at night?) and then drop the connection or re-resolve the servers IP (and maybe do something tricky/creative with DNS) automatically. I have not tested any of this, I've just glanced at the documentation. It looks like this could work for you, you'll just need to experiment.

Aside from that, I don't see anything inherently wrong with dropping client connections to get them to reconnect by restarting their instances or taking the backup link down momentarily as long as users can handle a brief disconnect. Were you worried about that? Or just keen on having it happen automatically?

Extracted from OpenVPN man page: https://openvpn.net/index.php/open-source/documentation/manuals/65-openvpn-20x-manpage.html

--inactive n (Experimental) Causes OpenVPN to exit after n seconds of inactivity on the TUN/TAP device. The time length of inactivity is measured since the last incoming tunnel packet.

--ping n Ping remote over the TCP/UDP control channel if no packets have been sent for at least n seconds (specify --ping on both peers to cause ping packets to be sent in both directions since OpenVPN ping packets are not echoed like IP ping packets). When used in one of OpenVPN's secure modes (where --secret, --tls-server, or --tls-client is specified), the ping packet will be cryptographically secure. This option has two intended uses:

(1) Compatibility with stateful firewalls. The periodic ping will ensure that a stateful firewall rule which allows OpenVPN UDP packets to pass will not time out.

(2) To provide a basis for the remote to test the existence of its peer using the --ping-exit option.

--ping-exit n Causes OpenVPN to exit after n seconds pass without reception of a ping or other packet from remote. This option can be combined with --inactive, --ping, and --ping-exit to create a two-tiered inactivity disconnect. For example,

openvpn [options...] --inactive 3600 --ping 10 --ping-exit 60

when used on both peers will cause OpenVPN to exit within 60 seconds if its peer disconnects, but will exit after one hour if no actual tunnel data is exchanged.

--ping-restart n Similar to --ping-exit, but trigger a SIGUSR1 restart after n seconds pass without reception of a ping or other packet from remote. This option is useful in cases where the remote peer has a dynamic IP address and a low-TTL DNS name is used to track the IP address using a service such as http://dyndns.org/ + a dynamic DNS client such as ddclient.

If the peer cannot be reached, a restart will be triggered, causing the hostname used with --remote to be re-resolved (if --resolv-retry is also specified).

In server mode, --ping-restart, --inactive, or any other type of internally generated signal will always be applied to individual client instance objects, never to whole server itself. Note also in server mode that any internally generated signal which would normally cause a restart, will cause the deletion of the client instance object instead.

In client mode, the --ping-restart parameter is set to 120 seconds by default. This default will hold until the client pulls a replacement value from the server, based on the --keepalive setting in the server configuration. To disable the 120 second default, set --ping-restart 0 on the client.

See the signals section below for more information on SIGUSR1.

Note that the behavior of SIGUSR1 can be modified by the --persist-tun, --persist-key, --persist-local-ip, and --persist-remote-ip options.

Also note that --ping-exit and --ping-restart are mutually exclusive and cannot be used together.

I suggest reading the manual. There's more stuff there.

See this too - PfSense specific discussion about the matter: https://forum.pfsense.org/index.php?topic=42935.0

2. Another idea is to run a script (to restart OpenVPN?) on interface status change. This also, I'm not going to go test or anything but I did find some discussion about it.

https://forum.pfsense.org/index.php?topic=65846.0

Apparently you can store commands in /etc/devd.conf

Mine contains:

# $Id$
# $FreeBSD: src/etc/devd.conf,v 1.26.2.1 2005/09/03 22:49:22 sam Exp $

options {
        directory "/etc/devd";
        directory "/usr/local/etc/devd";
        pid-file "/var/run/devd.pid";
        set scsi-controller-regex
                "(aac|adv|adw|aha|ahb|ahc|ahd|aic|amd|amr|asr|bt|ciss|ct|dpt|\
                esp|ida|iir|ips|isp|mlx|mly|mpt|ncr|ncv|nsp|stg|sym|trm|wds)\
                [0-9]+";
};

# CARP notify hooks. This will call carpup/carpdown with the
# interface (carp0, carp1) as the first parameter.
notify 100 {
    match "system"          "CARP";
    match "type"            "MASTER";
    action "/usr/local/sbin/pfSctl -c 'interface carpmaster $subsystem'";
};

notify 100 {
    match "system"          "CARP";
    match "type"            "BACKUP";
    action "/usr/local/sbin/pfSctl -c 'interface carpbackup $subsystem'";
};

notify 100 {
    match "system"          "CARP";
    match "type"            "INIT";
    action "/usr/local/sbin/pfSctl -c 'interface carpbackup $subsystem'";
};

# When a USB keyboard arrives, attach it as the console keyboard.
attach 100 {
        device-name "ukbd0";
        action "kbdcontrol -k /dev/ukbd0 < /dev/console 2>/dev/null";
};

detach 100 {
        device-name "ukbd0";
        action "kbdcontrol -k /dev/kbd0 < /dev/console 2>/dev/null";
};

#
# Signal upper levels that an event happened on ethernet class interface
#
notify 0 {
        match "system"          "IFNET";
        match "type"            "LINK_UP";
        media-type              "ethernet";
        action "/usr/local/sbin/pfSctl -c 'interface linkup start $subsystem'";
};

notify 0 {
        match "system"          "IFNET";
        match "type"            "LINK_DOWN";
        media-type              "ethernet";
        action "/usr/local/sbin/pfSctl -c 'interface linkup stop $subsystem'";
};

#
# Signal upper levels that an event happened on 802.11 class interface
#
notify 0 {
        match "system"          "IFNET";
        match "type"            "LINK_UP";
        match "subsystem"       "[a-z]+[0-9]+_wlan[0-9]+";
        action "/usr/local/sbin/pfSctl -c 'interface linkup start $subsystem'";
};

# Notify all users before beginning emergency shutdown when we get
# a _CRT or _HOT thermal event and we're going to power down the system
# very soon.
notify 10 {
        match "system"          "ACPI";
        match "subsystem"       "Thermal";
        match "notify"          "0xcc";
        action "logger -p kern.emerg 'WARNING: system temperature too high, shutting down soon!'";
};

Maybe that will work for you