How to Exclude All Forwarded Traffic from Connection Tracking on a Router Using iptables/nftables

iptablesnftables

A Linux box has multiple network interfaces. IP forwarding is enabled for IPv4 and IPv6.

I would like to protect the services running on the router itself via a stateful firewall. For that, connection tracking needs to be enabled. At the same time, I would like to exclude all traffic that is forwarded from one interface to another from connection tracking.

For the stateful firewall, I would typically use the INPUT and OUTPUT chains of the filter table. Forwarded traffic would go to the FORWARD chain. But AFAIK there is no way to mark traffic as untracked in the FORWARD chain. Such logic has to go to the PREROUTING chain in the raw table. But, I believe, in the PREROUTING chain it has not yet been decided whether traffic is forwarded or not.

Connection tracking has many disadvantages, such as packet drops when the list of tracked connection has reached its maximum size.

What is the easiest way to exclude forwarded traffic (and only that forwarded traffic) from connection tracking?

Best Answer

For a generic ruleset, one can ask nftables to do a route lookup in advance using the fib expression instead of waiting for the routing stack to do it. This allows to involve the (future) output interface despite not existing yet (routing decision didn't happen), at the cost of an extra lookup. Then if the results tells the packet will be routed, prevent tracking to happen using a notrack statement.

FIB EXPRESSIONS
fib {saddr | daddr | mark | iif | oif} [. ...] {oif | oifname | type}
A fib expression queries the fib (forwarding information base) to obtain information such as the output interface index a particular address would use. The input is a tuple of elements that is used as input to the fib lookup functions.

NOTRACK STATEMENT

The notrack statement allows to disable connection tracking for certain packets.
notrack
Note that for this statement to be effective, it has to be applied to packets before a conntrack lookup happens. Therefore, it needs to sit in a chain with either prerouting or output hook and a hook priority of -300 or less.

So one should do a "simple" route check from prerouting, using only the destination address as selector and check for the existence of an output interface (non-routable packets or packets intended for the host won't resolve any). There's an exception for the lo (loopback) interface to keep it tracked: while it represents local traffic, a packet sent (through the output path) from host to itself comes back through prerouting path and does have an output interface of lo too. As the outgoing packet already created a conntrack entry, better keep this consistent.

nft add table ip stateless
nft add chain ip stateless prerouting '{ type filter hook prerouting priority -310; policy accept; }'
nft add rule ip stateless prerouting iif != lo fib daddr oif exists notrack

Replacing the ip family with the inet combo family should extend the same generic behavior to IPv4+IPv6.

To be more specific one could specify the future output interface with fib daddr oif eth1 for example, which is more or less the equivalent of oif eth1, but also available in prerouting.

Of course if the topology is known in advance it's possible to avoid a FIB lookup by using one or a few rules based on address tests since the routes are then known in advance by the administrator. Benchmarking the results might be needed to know if this is more interesting than keeping a generic method.

For example, with OP's provided information, replacing the previous rule with:

nft add rule ip stateless prerouting 'ip daddr != { 192.168.1.1, 192.168.2.1, 127.0.0.0/8 } notrack'

should have a near-equivalent effect. 127.0.0.0/8 is present for the same reasons as above with the lo interface.

Handling of broadcast (like 192.168.1.255 received on eth0) and multicast (like link-local 224.0.0.1 received on an interface) might not work the same in both methods nor as expected and would possibly require additional rules for specific needs, especially with the 2nd method. As tracking broadcast and multicast is rarely useful, because a reply source won't (and can't) be the original broadcast or multicast address destination so the conntrack entry will never "see" bidirectional traffic, it usually doesn't matter much for stateful rules.

Notes

This will usually not be compatible with stateful NAT.

My understanding is that DNAT toward a remote host will get its reply traffic not de-NATed and fail, and that forwarded SNAT won't trigger since there was no conntrack entry created. Rarely used SNAT in input path should be fine, and a combo of DNAT+SNAT (using a local address source) might also work since then in both original and reply directions there's a local destination involved so a conntrack entry should then always be correctly created or looked up.
standard ruleset

Actual rules using iptables or nftables (in its own different table) can then be done as usual, including stateful rules for the host itself. As routed traffic won't create conntrack entries, rules if any involving such traffic should stick to be only stateless and not use any ct expression because it would never match.
verifying behavior

One can check the overall behavior even without proper firewall rules by:
- using a dummy ct rule to be sure the conntrack facility gets registered in the current network namespace.
```
nft add table ip mytable
nft add chain ip mytable mychain '{ type filter hook prerouting priority -150; policy accept; }'
nft add rule ip mytable mychain ct state new
```
- use the conntrack tool to follow events:
```
conntrack -E
```
- generate traffic from remote
  
  NEW conntrack entries will be then created for traffic to be received by the router, but not for routed traffic.

Related Solutions

Linux – Understanding connection tracking in iptables

The first question is what is conntrack. This is the website for conntrack-tools. With that in mind what does state do?

The State Match

The most useful match criterion is supplied by the state' extension, which interprets the connection-tracking analysis of theip_conntrack' module. This is highly recommended.

Specifying -m state' allows an additional--state' option, which is a comma-separated list of states to match (the `!' flag indicates not to match those states). These states are:

NEW A packet which creates a new connection.

ESTABLISHED A packet which belongs to an existing connection (i.e., a reply packet, or outgoing packet on a connection which has seen replies).

RELATED A packet which is related to, but not part of, an existing connection, such as an ICMP error, or (with the FTP module inserted), a packet establishing an ftp data connection.

INVALID A packet which could not be identified for some reason: this includes running out of memory and ICMP errors which don't correspond to any known connection. Generally these packets should be dropped.

An example of this powerful match extension would be:

# iptables -A FORWARD -i ppp0 -m state ! --state NEW -j DROP

Firewall questions about state and policy?

So, to answer the question, conntrack is for use with the conntrack toolkit and supersedes state in this regard. It is better than state if you are planning on using the conntrack tool kit.

Connection tracking is on for traffic flows, it constantly tries to match flows to rules.

The answer that follows for question 2 is, yes, use conntrack

To answer question 3, which case? The answer for state is in the definition above.

The answer to 4 is, conntrack is for use with the conntrack toolkit, and state, for not using the toolkit. Yes, you can use conntrack at no penalty over using state with your example.

Iptables counters in NAT table and state NOT NEW

If the server is a gateway - you should use FORWARD chain

Setup iptables

# iptables -I FORWARD -p tcp -d 92.48.119.223 --dport 80 -j ACCEPT
# iptables -I FORWARD -p tcp -s 92.48.119.223 --sport 80 -j ACCEPT

We will download a simple file

# curl -I http://mirror.centos.org/centos/6.7/os/x86_64/images/boot.iso
HTTP/1.1 200 OK
Date: Thu, 17 Mar 2016 18:17:53 GMT
Server: Apache/2.2.15 (CentOS)
Last-Modified: Tue, 04 Aug 2015 21:41:08 GMT
ETag: "2800ae-e600000-51c8324d84500"
Accept-Ranges: bytes
Content-Length: 241172480
Connection: close
Content-Type: application/octet-stream

Download the file

# wget http://mirror.centos.org/centos/6.7/os/x86_64/images/boot.iso
--2016-03-17 20:18:14--  http://mirror.centos.org/centos/6.7/os/x86_64/images/boot.iso
Resolving mirror.centos.org (mirror.centos.org)... 92.48.119.223
Connecting to mirror.centos.org (mirror.centos.org)|92.48.119.223|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 241172480 (230M) [application/octet-stream]
Saving to: `boot.iso'

100%[======================================================================>] 241,172,480 9.67M/s   in 25s

2016-03-17 20:18:39 (9.26 MB/s) - `boot.iso' saved [241172480/241172480]

Check the rules

# iptables -L FORWARD -n -v -x
Chain FORWARD (policy ACCEPT 6 packets, 408 bytes)
    pkts      bytes target     prot opt in     out     source               destination
   33478  1756965 ACCEPT     tcp  --  *      *       0.0.0.0/0            92.48.119.223       tcp dpt:80
   27818 244733384 ACCEPT     tcp  --  *      *       92.48.119.223        0.0.0.0/0           tcp spt:80

244733384 is what you are looking for.

244733384 - 241172480 = 3560904 ~ 3,4 Mb

It's an overhead of tcp/ip + http

Does it mean that nat table counters are only incremented for the first packet of every connection?

yes, it does. And then it uses connection tracking

# lsmod | grep conn
nf_conntrack_ipv4       9154  3 iptable_nat,nf_nat
nf_conntrack           79206  3 iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4          1483  1 nf_conntrack_ipv4

The idea is to do it with iptables. Very lightweight (no proxy source code modification, and we let the kernel count packets instead of doing it ourself).

As you said before - you have 5-50 clients, so you can try do accounting through iptables and -j LOG action

Configure rsyslog

# cat /etc/rsyslog.d/accounting.conf
:msg, contains, "CLIENT-192.168.88.87-IN" /var/log/accounting/client-192.168.88.87.log
:msg, contains, "CLIENT-192.168.88.87-OUT" /var/log/accounting/client-192.168.88.87.log
:msg, contains, "CLIENT"     ~

Configure iptables

# iptables -t mangle -I OUTPUT -s 192.168.88.87 ! -d 192.168.0.0/16 -j LOG --log-prefix "CLIENT-192.168.88.87-OUT "

# iptables -t mangle -I INPUT ! -s 192.168.0.0/16 -d 192.168.88.87 -j LOG --log-prefix "CLIENT-192.168.88.87-IN "

Check that all works as it should

# ping -c 1 8.8.4.4
PING 8.8.4.4 (8.8.4.4) 56(84) bytes of data.
64 bytes from 8.8.4.4: icmp_seq=1 ttl=50 time=43.1 ms

--- 8.8.4.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 43ms
rtt min/avg/max/mdev = 43.114/43.114/43.114/0.000 ms

# iptables -t mangle -L INPUT -nvx
Chain INPUT (policy ACCEPT 1256 packets, 116836 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       1       84 LOG        all  --  *      *      !192.168.0.0/16       192.168.88.87       LOG flags 0 level 4 prefix `CLIENT-192.168.88.87-IN '

# iptables -t mangle -L OUTPUT -nvx
Chain OUTPUT (policy ACCEPT 304 packets, 91325 bytes)
    pkts      bytes target     prot opt in     out     source               destination
       1       84 LOG        all  --  *      *       192.168.88.87       !192.168.0.0/16      LOG flags 0 level 4 prefix `CLIENT-192.168.88.87-OUT '

# cat /var/log/accounting/client-192.168.88.87.log
Mar 21 09:12:22 ci kernel: CLIENT-192.168.88.87-OUT IN= OUT=eth0 SRC=192.168.88.87 DST=8.8.4.4 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=38520 SEQ=1
Mar 21 09:12:22 ci kernel: CLIENT-192.168.88.87-IN IN=eth0 OUT= MAC=08:00:27:eb:c9:fc:4c:5e:0c:51:b7:d4:08:00 SRC=8.8.4.4 DST=192.168.88.87 LEN=84 TOS=0x04 PREC=0x00 TTL=50 ID=0 PROTO=ICMP TYPE=0 CODE=0 ID=38520 SEQ=1

Do some real test

# wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
--2016-03-21 09:14:35--  https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
Resolving bitbucket.org... 104.192.143.2, 104.192.143.3, 104.192.143.1
Connecting to bitbucket.org|104.192.143.2|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
...
Resolving bbuseruploads.s3.amazonaws.com... 54.231.49.250
Connecting to bbuseruploads.s3.amazonaws.com|54.231.49.250|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23415665 (22M) [application/x-tar]
Saving to: “phantomjs-2.1.1-linux-x86_64.tar.bz2”

100%[==============================================================================================>] 23,415,665  3.78M/s   in 6.7s

2016-03-21 09:14:43 (3.31 MB/s) - “phantomjs-2.1.1-linux-x86_64.tar.bz2” saved [23415665/23415665]

As you can see from the output - the client has been downloaded ~ 22,33 Mb

23415665 (bytes) / 1024 (Kbytes) / 1024 (Mbytes) ~ 22,33 Mb

And now we can calculate through log file

# cat client-192.168.88.87.log | grep CLIENT-192.168.88.87-IN | grep SRC=54.231.49.250 | grep 'SPT=443' | awk '{print $12}' | cut -d '=' -f 2 | awk '{SUM+=$1;} END{printf "%.2f Mb",SUM/1048576}'
22.75 Mb

Of course you can mix and filter sport/dport/dest ip and so on and get any statistics you want

Best Answer

Related Solutions

Linux – Understanding connection tracking in iptables

Iptables counters in NAT table and state NOT NEW

Related Topic