HTTPS slow/times out while HTTP is working normally

httphttpstroubleshooting

tl:dr: http traffic works reliably and quick while https traffic is unreliable or extremely slow (2-5 minute load times). Details below.

Hey server fault, I have a good one for you. This is the beginning of week two at a new place that just expanded to included a 15th regional office the week before I hired in.

The new office is using a temporary cellular MPLS connection while we wait for the crews to get a hard line in. The new office is using the same hardware and firmware for mpls as the other 14 office, verified several times.

The web filtering and firewalling occurs in our DC which is not in this new office location. I want to call it a hub and spoke network, but since I have been hired-in during a "clean up" and documentation is sparse and I have never worked in a environment this complex before I can not be certain.

The problem:

Users have a handful of https sites they need to use to do their work. These sites will not load regularly, if it all. If I had to put a number on it, they will load one out of 50 tries then fail to load the next page. A odd part of this is that the (browser) headers and URL will load almost every time, but nothing else will load past that.

Meanwhile http loads normal.

There is a server (DC, file server, DNS) in this office (like all the other branches) and it does have replication issues that could be related to this problem.

We have checked the following:

  • Disabled AV/firewalls on hosts
  • We can ping the websites in question all the time
  • We have changed SSL/TLS settings in IE several times
  • On the web filter, logs show all HTTPS sites are being allowed to
    this branch
  • On the firewall, logs show all HTTPS traffic is being sent
    through/unscanned for this branch
  • Our MPLS provider found a misconfig in the HQ MPLS router, "fixed"
    this but this did not provide any change
  • pcap'd traffic from a test workstation; communication to these sites
    exists but the remote hosts send keep alives for 5 minutes before
    resetting the connection which is when we see a "page can not be
    displayed" error

I have been adjusting, tweaking, playing, reading logs, double/triple checking, calling ISP's and googling for 4 days now trying everything I can find and nothing has made a change (worse or better).

My last thought is it might be the slow cellular MPLS connection (1-3 Mbps; 12 users/12 VoIP phones/1 server) but I keep ruling that out because I feel it would present itself in HTTP traffic as well.

Best Answer

Glad you got a working solution to this. Based on what you have described, this sounds like a Path MTU Discovery Black Hole issue.

Essentially, you may well have a router on your route to the internet which has an MTU lower than the standard 1500 Bytes (1500 likely being used by your clients and the webservers they are talking to). Normally, this isn't an issue because when a router receives a packet that is too big to send out its next-hop interface it will drop the packet and send an ICMP Fragmentation Needed packet back to the sender. This ICMP packet includes the correct MTU so the sender can send all future packets at the correct size.

Problems arise if the Fragmentation Required packets are getting dropped - perhaps a router in the forwarding path has an overly aggressive Access Control policy. This results in the sender sending large packets which are dropped but then no feedback is being sent back - the sender will just keep trying to retransmit the packets.

If you look at this on the client side, you aren't seeing any traffic from the sender so you start sending Keep Alive probes.

Just as you described, you will often find that the TCP handshake and the initial GET request goes through OK because those packets are typically small. It isn't until the sender has to start sending full sized packets that the issue becomes apparent.

If this the issue you are experiencing, you should strongly advise whoever is responsible for the routers in the forwarding path to NOT drop ICMP Fragmentation Required packets - doing so can and will break things.

Related Topic