Things you should be aware of: SRX HA links communicate using jumbo frames and multicast addresses. So to make this work you need at least the following changes on the EX switches:
Configure a jumbo MTU on the HA links and the links between the EX switches. This will enable jumbo frames to go trough the switch infrastructure.
set interface x mtu 9216
Deactivate igmp-snooping for the HA VLAN or if not needed delete it completely. If you leave it enabled the switch will not forward the multicast frames because there are no IGMP messages to tell the switch which port is listening for these frames.
set protocols igmp-snooping vlan HA-VLAN disable
or
delete protocols igmp-snooping
Over which media you connect the switches with each other is not important, you can use copper or fiber as you wish. I would use an ae
bundle interface with two or more links to reduce the chance of one link failure killing the whole connection between the switches. Don't forget to enable LACP on the ae bundle. If possible have the two links take different routes to prevent someone cutting both fibers at the same time.
This being said, if possible I would always suggest directly connecting the HA links of any firewall device. This reduces the risk that a problem on the switch (hardware failure, software bug, human error) will cause you a split-brain situation (both firewalls become master) which will certainly ruin your day (BTDT).
TCP handshake timeout on the SRX is 20 seconds by default and you can't manually set it lower than 4 seconds, so that's definitely not the issue.
Did you do the security flow trace in the other direction? It would be nice to see the session initiation (the SRX processing the initial TCP SYN) to see what the initial session actually looks like. That might shine some light on why the return traffic doesn't match the session.
To answer your question though, in checking to see if a session is already established, the SRX will look at six match criteria:
To determine if a packet belongs to an existing flow, the device attempts to match the packet’s information to that of an existing session based on the following six match criteria:
• Source address
• Destination address
• Source port
• Destination port
• Protocol
• Unique token from a given zone and virtual router
Sources:
http://www.juniper.net/techpubs/en_US/junos12.1x47/information-products/pathway-pages/security/security-processing-flow-based.pdf
http://www.juniper.net/techpubs/en_US/junos12.1/information-products/topic-collections/security/software-all/security/junos-security-swconfig-security.pdf
Specific ingress/egress interfaces do not have to be the same as the initial session creation as long as they are in the same security zones as the interfaces used to set up the session.
Source:
https://kb.juniper.net/InfoCenter/index?page=content&id=KB21983
*see the note, just above purpose.
Also - if interfaces/zones were an issue, you'd typically get specific output based on that. In my experience, I've seen drops in security flow traces that referenced the reason being that the egress interface (in the return direction) was not in the same security zone that the initial ingress traffic that established the session came in on. It's pretty verbose for the most part.
Even with a chassis cluster, not much should be different as far as "session" matching goes, although there are some extra things that happen in some cases.
If you're running an active/active cluster, where forwarding redundancy groups are primary/active on different nodes, you could end up with z-mode traffic. So if the ingress is on node 0 and the egress is on node 1, the active session will be maintained on node 1 (egress of the initial sync) and the backup session will be maintained on node 0.
With Z-mode processing, the first packet of a sessionis received on one cluster node (the ingress node). When flow determinesthat the egress interface is located on the second node, the packet is forwarded over the fabric link with a forward session setup on the ingress node. The packetis then processed by the second node upon which anActive session is installed and the packet is forwarded out the egress link. Finally a backup session is created for the Active session in the initial ingress node.
Source:
http://kb.juniper.net/library/CUSTOMERSERVICE/GLOBAL_JTAC/NT260/SRX_HA_Deployment_Guide.pdf
Unless you're doing this with asymmetric routing and the return traffic is leaving an interface on node 1 and returning on node 0 (or vice versa), then we don't have to explore this further - although I believe the backup session can be used and if the zones match and the traffic should still pass. I'll have to explore that more if that's what's going on.
Best Answer
For data center deployments, I like Layer 2 Transparent Mode Chassis Clusters as layer2 adjacency is required for vMotion, for example. You can then do your layer 3 stuff on the ASR's but allow east west (server-to-server) traffic to bypass them all together.
ISSU appears to be supported so you won't bring down the data center when rolling out new software to the SRX's.
Your ASR's can provide VPN and NAT.