I have a Juniper SRX240H2(JUNOS 12.1X44-D20.3) firewall cluster in flow mode with interface reth2.1
facing Internet and interface reth1.1
facing LAN. I have a problem with one particular source and destination IP pair. Source sends TCP SYN packet over the Internet to destination which is behind NAT, NAT translation occurs fine and destination in LAN will reach the packet. However, reply(TCP SYN+ACK) will be dropped in firewall because previous flow is not found:
Apr 8 15:08:04 15:08:07.821685:CID-1:RT: reth1.1:10.70.50.201/515->104.236.80.115/1021, tcp, flag 12 syn ack
Apr 8 15:08:04 15:08:07.821685:CID-1:RT: find flow: table 0x5115c900, hash 880(0xffff), sa 10.70.50.201, da 104.236.80.115, sp 515, dp 1021, proto 6, tok 9
Apr 8 15:08:04 15:08:07.821685:CID-1:RT: no session found, start first path. in_tunnel - 0x0, from_cp_flag - 0
Apr 8 15:08:04 15:08:07.821685:CID-1:RT: packet dropped, first pak not sync
Apr 8 15:08:04 15:08:07.821685:CID-1:RT: flow find session returns error.
Apr 8 15:08:04 15:08:07.821685:CID-1:RT: ----- flow_process_pkt rc 0x7 (fp rc -1)
Latency between the firewall and destination in LAN is ~20ms so it should not be because flow timeouts. How does the TCP SYN checking work in SRX? Does it expect the TCP SYN+ACK use the same egress interface where the initial TCP SYN came in from?
Best Answer
TCP handshake timeout on the SRX is 20 seconds by default and you can't manually set it lower than 4 seconds, so that's definitely not the issue.
Did you do the security flow trace in the other direction? It would be nice to see the session initiation (the SRX processing the initial TCP SYN) to see what the initial session actually looks like. That might shine some light on why the return traffic doesn't match the session.
To answer your question though, in checking to see if a session is already established, the SRX will look at six match criteria:
Sources:
http://www.juniper.net/techpubs/en_US/junos12.1x47/information-products/pathway-pages/security/security-processing-flow-based.pdf
http://www.juniper.net/techpubs/en_US/junos12.1/information-products/topic-collections/security/software-all/security/junos-security-swconfig-security.pdf
Specific ingress/egress interfaces do not have to be the same as the initial session creation as long as they are in the same security zones as the interfaces used to set up the session.
Source:
https://kb.juniper.net/InfoCenter/index?page=content&id=KB21983
*see the note, just above purpose.
Also - if interfaces/zones were an issue, you'd typically get specific output based on that. In my experience, I've seen drops in security flow traces that referenced the reason being that the egress interface (in the return direction) was not in the same security zone that the initial ingress traffic that established the session came in on. It's pretty verbose for the most part.
Even with a chassis cluster, not much should be different as far as "session" matching goes, although there are some extra things that happen in some cases.
If you're running an active/active cluster, where forwarding redundancy groups are primary/active on different nodes, you could end up with z-mode traffic. So if the ingress is on node 0 and the egress is on node 1, the active session will be maintained on node 1 (egress of the initial sync) and the backup session will be maintained on node 0.
Source:
http://kb.juniper.net/library/CUSTOMERSERVICE/GLOBAL_JTAC/NT260/SRX_HA_Deployment_Guide.pdf
Unless you're doing this with asymmetric routing and the return traffic is leaving an interface on node 1 and returning on node 0 (or vice versa), then we don't have to explore this further - although I believe the backup session can be used and if the zones match and the traffic should still pass. I'll have to explore that more if that's what's going on.