Thanks for sending those captures over.
The Problem
Your throughput issues appear to be caused by a buggy implementation of TCP Sequence Number randomization. I have seen this in the past on Cisco ASAs.
To give a bit of background, it was observed in the past that some TCP implementations did not use enough randomness when choosing an Initial Sequence Number (ISN) which made it easier for attackers to manipulate TCP connections by making educated guesses at what the Sequence number would be.
To attempt to fix this issue, some firewall providers implemented a feature called TCP sequence number randomization, which rewrites the Sequence number (SEQ) to a more random value, when it sees TCP packets flowing through the firewall. Unfortunately some implementations of this feature are a bit buggy and do not account for TCPs Selective Acknowledgement (SACK) feature.
You can see Sequence Number randomization in action in your trace. Look at the SYN/ACK packet from the server (packet #51 server capture), where you can see that the ISN chosen is 2847541373. However look at the same SYN/ACK packet when it is received on the client side (packet #8 client capture), the ISN has been changed to 2098751282!
This behavior is fine up until the point that packet loss is experienced on the network.
On the client side, look at the first Duplicate Acknowledgement (Dup ACK) at packet 259. You can see that a SACK block has been set covering bytes 2098977399-2098978787. This packet effectively tells the server, I'm waiting on packet with SEQ 2098974623, however I have received 2098977399-2098978787 so you don't need to send those again.
Now, if you look at the same Dup ACK as it is received on the server side (#369), you can see the ACK number has been correctly converted by the firewall (2098974623 > 2847764714), however the SACK block hasn't and still shows 2098977399-2098978787!
When a Dup ACK is received with an invalid SACK block, the Dup ACK is ignored.
As a result, you lose out on the ability to use Fast Retransmission (retransmit after 3 duplicate ACKs received) and rely solely on Retranmission Timeouts. This is really, really bad for performance and will reduce your throughput substantially.
So what can you do?
You can investigate whether TCP Sequence Number randomisation is still required for your purposes and if not, consider testing with it disabled. Perhaps this issue has been resolved in a newer firmware?
You could also turn off the TCP SACK option on your server(s) to prevent clients from using SACK in the first place /proc/sys/net/ipv4/tcp_sack
however please note that SACK is meant to be used to improve TCP performance and the actual issue is with the firewalls (buggy) implementation of Sequence number randomization. Turning off SACK will mean that Dup ACK's from clients will no longer be ignored and the connection will be able to recover from loss a lot quicker. Throughput should go up.
Please try to connect to the TCP 1194 locally. This test will avoid the influence of firewall.
If the connection fails, it suggests that there is something wrong in the server side.
If the connection success, to narrow down the scope of this issue, I would suggest you perform a network capture on the server to check if the connect from the client has reached the network adaptor of the server.
If the adaptor receives the packets from client, it means that something on the server block the connection. The most common cause is the misconfigured firewall. (Iptables)
If the adaptor doesn't receive the packets, then you should double-check the NSG. Or try to recreate them. If issue persists and you are sure about the NSG inbound rules. Then you may need to open a ticket with Azure support, so that they are able to perform the network capture on the host level to find out what drops the packets.
Best Answer
Okay, as per usual, I cleared out all the firewall configurations related to the web server, updated to newer snapshot, rebooted pfSense, configured them all from scratch, and booyah! Landed on the login page.
I suppose you could tack it up to some lingering states. Anyway, all is well.
In the end, configuration requires public IP configured as Virtual IPs, standard practice NAT and firewall rules, and proper vhost configuration. I didn't tweak anything on the web server itself, so I think it was just some leftover junk in pfSense.
Thanks for the help! The curl command really helped in troubleshooting.