You can check the URLs passed in the HTTP requests. If the URLs between the two flows match, the flows are just duplicates and you can simply discard one of them. The first flow is more meaningful to you as you said because it tells the client IP.
We do something quite similar here. 3 subnets behind a CentOS5 "router". Basically we just have iptables set to the follow 'nat' table rule:
iptables -t nat -A POSTROUTING -o <external NIC device> -j SNAT --to-source <external interface IP>
In our case, device is eth1 and the IP is 10.0.0.2 to differentiate from the Class C IP4 subnets we're still using here.
The real work is done by the routing table. If your NICs are configured properly, the routing table entries should already exist.
For instance, we have these two subnets in the routing table:
192.168.16.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.13.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2
But the external traffic is handled by the default gateway line:
0.0.0.0 10.0.0.10 0.0.0.0 UG 0 0 0 eth1
And traffic coming back in through the NAT is tracked by the netfilter module in the kernel and sent to its originating IP by the 'State RELATED,ESTABLISHED' line in the regular chain FORWARD in iptables:
158M 168G ACCEPT all -- eth1 eth0 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
8M 11G ACCEPT all -- eth1 eth0 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
(Note: Any neckbeard want to correct errors in this, please pop in a comment. I'd love to hear a critique.)
Best Answer
I think the basis for their NetFlow support is centered on open source technologies. We tried to configure Mikrotik NetFlow with Scrutinizer NetFlow & sFlow Analyzer and it wasn't possible at that time, which was about a year ago. If things have changed with their NetFlow support since then, we would certainly be willing to run more tests. Please send us a packet capture, if we can be of any help.