MikroTik – Traffic flow (Netflow) Octets Counter wrap

mikrotiknetflowrouteros

I am using Traffic Flow with pmacct (nfacct) to do IP Accounting.

I've noticed that if a flow exceeds ~4GBytes in less than a minute (which is my active-flow-timeout) the exported flow Octets counter wraps around losing a significant amount of total data measured.

I believe this is an issue with the Octet counter being 32bit unsigned and if the traffic is over that threshold (4294967296) then the exporter wraps around the counter without first sending the flow to the collector (I am not sure how other vendors handle this).

This is quite serious since it results in very wrong traffic totals!

Here is my traffic flow configuration:

/ip traffic-flow
set active-flow-timeout=1m cache-entries=1k enabled=yes interfaces=sfp1
/ip traffic-flow target
add dst-address=X.X.X.X v9-template-refresh=60 v9-template-timeout=1m

And here are a couple of flow captures from wireshark.

Flow 3
    [Duration: 59.590000000 seconds (switched)]
    Packets: 5700194
    Octets: 4255323704
    InputInt: 16
    OutputInt: 0
    SrcAddr: 31.X.X.254
    DstAddr: 185.X.X.254
    Protocol: UDP (17)
    IP ToS: 0x00
    SrcPort: 2043 (2043)
    DstPort: 2299 (2299)
    NextHop: 185.X.X.X
    DstMask: 0
    SrcMask: 0
    TCP Flags: 0x00
    Destination Mac Address: Routerbo_XX:XX:XX (d4:ca:6d:XX:XX:XX)
    Post Source Mac Address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Post NAT Source IPv4 Address: 31.X.X.254
    Post NAT Destination IPv4 Address: 185.X.X.254
    Post NAPT Source Transport Port: 0
    Post NAPT Destination Transport Port: 0

Second capture:

Flow 3
    [Duration: 59.590000000 seconds (switched)]
    Packets: 5532208
    Octets: 4003344704
    InputInt: 16
    OutputInt: 0
    SrcAddr: 31.X.X.254
    DstAddr: 185.X.X.254
    Protocol: UDP (17)
    IP ToS: 0x00
    SrcPort: 2043 (2043)
    DstPort: 2299 (2299)
    NextHop: 185.X.X.X
    DstMask: 0
    SrcMask: 0
    TCP Flags: 0x00
    Destination Mac Address: Routerbo_XX:XX:XX (d4:ca:6d:XX:XX:XX)
    Post Source Mac Address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Post NAT Source IPv4 Address: 31.X.X.254
    Post NAT Destination IPv4 Address: 185.X.X.254
    Post NAPT Source Transport Port: 0
    Post NAPT Destination Transport Port: 0

At the time of those captures, a bandwidth test (UDP, 1500bytes, 1Gbit, receive) was running for quite some time.
So running at 1gbit for 60seconds (active-flow-timeout) it should have measured at least ~7864320000 Octets (~7.3GB)

If I reduce the bandwidth test to 460mbit then the exported flows seem to report the traffic properly since the Octets counter does not exceed the 32bit unsigned maximum.
Though I see quite a lot of overhead and I wonder why that is.
At 460mbit sustained traffic, in 60seconds it should measure ~3617587200 octets (=3.36GB).
But instead it measured 4269160500 (=3.9GB)
I am not sure where the extra ~600MB came from.

Flow 6
    [Duration: 59.590000000 seconds (switched)]
    Packets: 2846107
    Octets: 4269160500
    InputInt: 16
    OutputInt: 0
    SrcAddr: 31.X.X.254
    DstAddr: 185.X.X.254
    Protocol: UDP (17)
    IP ToS: 0x00
    SrcPort: 2058 (2058)
    DstPort: 2314 (2314)
    NextHop: 185.X.X.X
    DstMask: 0
    SrcMask: 0
    TCP Flags: 0x00
    Destination Mac Address: Routerbo_0d:95:72 (d4:ca:6d:XX:XX:XX)
    Post Source Mac Address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Post NAT Source IPv4 Address: 31.X.X.254
    Post NAT Destination IPv4 Address: 185.X.X.254
    Post NAPT Source Transport Port: 0
    Post NAPT Destination Transport Port: 0

But if I increase the bandwidth test to 480mbit for example, then the exported flow has its counter wrapped around losing a significant amount of data (ie: ~4GBytes of data)

Flow 3
    [Duration: 59.590000000 seconds (switched)]
    Packets: 2865308
    Octets: 2994704 <-- Only 2.8MB?! Even with 64byte packets,
                    based on the measured packets above, 
                    it should have measured > 174MBytes of data!
    InputInt: 16
    OutputInt: 0
    SrcAddr: 31.X.X.254
    DstAddr: 185.X.X.254
    Protocol: UDP (17)
    IP ToS: 0x00
    SrcPort: 2055 (2055)
    DstPort: 2311 (2311)
    NextHop: 185.X.X.X
    DstMask: 0
    SrcMask: 0
    TCP Flags: 0x00
    Destination Mac Address: Routerbo_0d:95:72 (d4:ca:6d:XX:XX:XX)
    Post Source Mac Address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Post NAT Source IPv4 Address: 31.X.X.254
    Post NAT Destination IPv4 Address: 185.X.X.254
    Post NAPT Source Transport Port: 0
    Post NAPT Destination Transport Port: 0

The above tests were made on a CCR1036-8G-2S+ running version 6.32.1 (I cannot upgrade since this is a production system).

Doing the same tests on a x86 installation (running 6.29 – also cannot upgrade because it's in production) the results are even worse!
There it appears that the Octets counter wraps around at 2147483647 which suggests that either in versions < 6.32.1 or in non Tilera builds the Octets counter is 32bit Signed.

The whole situation is pretty much the same with when you monitor a Gbit interface with v1 SNMP (32bit counters).
The solution in SNMP is very simple. Use SNMP v2 that supports 64bit counters.
But I cannot find any solution for Netflow.

Can anyone else confirm this issue?
Does anyone know a workaround for it?
Is this a limitation of the netflow protocol or a bug in RouterOS?
How do other vendors handle this (I don't have any other equipment at the moment to test this out) ?

Best Answer

Looking up at Cisco's documentation on NetFlow v9 it mentions that the bytes counter is by default 32bit, but it is configurable and suggests to increase it to 64bit on core routers etc.

In some cases the size of a field type is fixed by definition, for example PROTOCOL, or IPV4_SRC_ADDR. However in other cases they are defined as a variant type. This improves the memory efficiency in the collector and reduces the network bandwidth requirement between the Exporter and the Collector. As an example, in the case IN_BYTES, on an access router it might be sufficient to use a 32 bit counter (N = 4), on a core router a 64 bit counter (N = 8) would be required. All counters and counter-like objects are unsigned integers of size N * 8 bits.

So the protocol itself can support 64bit counters. It just seems that mikrotik's v9 template uses 32bit counters.

I just confirmed that by capturing the data template in wireshark.

FlowSet 1 [id=0] (Data Template): 256,257
    FlowSet Id: Data Template (V9) (0)
    FlowSet Length: 184
    Template (Id = 256, Count = 22)
        Template Id: 256
        Field Count: 22
        Field (1/22): LAST_SWITCHED
        Field (2/22): FIRST_SWITCHED
        Field (3/22): PKTS
        Field (4/22): BYTES
            Type: BYTES (1)
            Length: 4
        Field (5/22): INPUT_SNMP
        Field (6/22): OUTPUT_SNMP
        Field (7/22): IP_SRC_ADDR
        Field (8/22): IP_DST_ADDR
        Field (9/22): PROTOCOL
        Field (10/22): IP_TOS
        Field (11/22): L4_SRC_PORT
        Field (12/22): L4_DST_PORT
        Field (13/22): IP_NEXT_HOP
        Field (14/22): DST_MASK
        Field (15/22): SRC_MASK
        Field (16/22): TCP_FLAGS
        Field (17/22): DESTINATION_MAC
        Field (18/22): SOURCE_MAC
        Field (19/22): postNATSourceIPv4Address
        Field (20/22): postNATDestinationIPv4Address
        Field (21/22): postNAPTSourceTransportPort
        Field (22/22): postNAPTDestinationTransportPort
    Template (Id = 257, Count = 21)
        Template Id: 257
        Field Count: 21
        Field (1/21): IP_PROTOCOL_VERSION
        Field (2/21): IPV6_SRC_ADDR
        Field (3/21): IPV6_SRC_MASK
        Field (4/21): INPUT_SNMP
        Field (5/21): IPV6_DST_ADDR
        Field (6/21): IPV6_DST_MASK
        Field (7/21): OUTPUT_SNMP
        Field (8/21): IPV6_NEXT_HOP
        Field (9/21): PROTOCOL
        Field (10/21): TCP_FLAGS
        Field (11/21): IP_TOS
        Field (12/21): L4_SRC_PORT
        Field (13/21): L4_DST_PORT
        Field (14/21): FLOW_LABEL
        Field (15/21): IPV6_OPTION_HEADERS
        Field (16/21): LAST_SWITCHED
        Field (17/21): FIRST_SWITCHED
        Field (18/21): BYTES
            Type: BYTES (1)
            Length: 4
        Field (19/21): PKTS
        Field (20/21): DESTINATION_MAC
        Field (21/21): SOURCE_MAC

The bytes field have lenth 4.

So I guess this has to be fixed by MikroTik.

Unless someone is aware of a solution/workaround.