Can telnet to a service, but not access service ports directly

networking

We're running a variety of services on our cloud provider. Everything normally works fine, but occaisionally we end up with issues connecting to 1 host (which has our repos on it). We haven't been able to find a solution to the connectivity problem, so we completely rebuilt the host at a different cloud provider. Things had been running fine, but the same connectivity issue is starting again. I'll try to summarize clearly:

The host that is having connectivity issues is running Gitlab. We also ssh into that host a fair amount.

When we run into connectivity issues, we cannot access ssh, git, https etc. Pinging the host works fine. I can telnet to port 22, and get a response:

Connected to xyz.
Escape character is '^]'.
SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.1

I can access any port on the host via Telnet, and I get back a response immediately. If I try to connect to the same host via ssh, I get:

ssh -v -v me@xyz   
OpenSSH_7.9p1, LibreSSL 2.7.3
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug2: resolve_canonicalize: hostname xyz is address
debug2: ssh_connect_direct
debug1: Connecting to xyz [xyz] port 22.
debug1: connect to address xyz port 22: Operation timed out
ssh: connect to host xyz port 22: Operation timed out

If I disconnect from our local network, and connect via hotspot to the Internet, I'm able to access said host properly. This only happens to users on our corporate network.

I went down the path of checking all our local routers/firewalls, and couldn't find any issue. I then connected to the Internet from the external side of our corporate firewall, and the connectivity issues immediately started again.

I've spoken with our cloud provider (Google) and they see nothing wrong with our cloud configuration or servers. I've spoken with our Internet provider, and they can't see anything wrong either.

Anyone have any ideas?

Best Answer

This sounds a lot like an issue I have seen where the end point firewall runs out of ports. That is, until you said you connected to the external side of the firewall.

When you connected to the external side, did you have an internet address or was it still a nat'ed connection? If you had an actual internet routable IP address (not a 10.x.x.x or a 192.168.x.x address), then it has to be something your ISP (or their ISP) is filtering. I doubt that is the case, and suspect more that you are still getting a 192.168.x.x. or 10.x.x.x address when you connected to the external side of your firewall, meaning there is still a NAT device between you and the internet at that point (meaning port exhaustion can be the issue).

I'd suggest trying to connect a packet sniffer to the external side of your firewall, and confirm packets in both directions. You should see a packet leave to the host (cloud) and then return. If you see it return, but your client inside doesn't, then you know it's an issue with firewall or your internal network.

If it leaves and does not come back, it's the ISP or the cloud provider.

If it doesn't leave, your also looking at your network (firewalls, etc).

Note: Many companies, like Google (sometimes), will give you a default answer of "not us" when you call, because they figure if it was them, they would have hundreds of customers having the issue. In a way, they are right, but not always. Sometimes they can have an issue that only effects a minor few and others don't know how to report it (or where to call). Don't just accept there answer as a sure thing. They are human as well and can make mistakes as well (and don't have the time you have to dig deeper).