I have a Debian wheezy server running a few web applications, a MongoDB database and a Redis server behind a NGinx server. Only the NGinx server is public facing and the other services are reverse proxied behind it. This setup has been working perfectly until two days ago where there was a temporary power outage in the datacenter where my server is located. After rebooting and doing regular post-crash maintenance (deleting lock files, repairing DB, etc.) I noticed that NGinx was timing out on every service it proxies. Here are the steps I have taken to try resolve the problem:
-
Check Logs
I have checked the logs for every service and everything is clean with no errors (other that NGinx reporting the upstream connection time-out). -
Check services are running
All the processes for the WSGI application, MongoDB, etc. are running and I have also checked netstat:# netstat -ntple Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 0 21730537 1469/nginx tcp 0 0 0.0.0.0:2525 0.0.0.0:* LISTEN 1000 21730714 1511/python tcp 0 0 0.0.0.0:9090 0.0.0.0:* LISTEN 1000 21730931 1627/python tcp 0 0 0.0.0.0:2022 0.0.0.0:* LISTEN 0 21730651 1553/sshd tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 1000 21730885 1624/python tcp 0 0 127.0.0.1:27017 0.0.0.0:* LISTEN 104 21730531 1376/mongod tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 105 21730621 1532/redis-server * tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 1000 21730731 1500/python tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 0 21730536 1469/nginx tcp6 0 0 :::2022 :::* LISTEN 0 21730654 1553/sshd tcp6 0 0 :::6379 :::* LISTEN 105 21730619 1532/redis-server *
-
Check loopback interface and ping 127.0.0.1
The loopback interface is properly set up in/etc/network/interfaces
andifconfig
reports it up and running. I can also ping 127.0.0.1 and localhost without problem. -
Disable firewall
Disabling the firewall did not change the situation. The connection is still timing out. -
Try to connect via telnet
I tried to telnet to one of the services and that is where I noticed an odd pattern:# telnet 127.0.0.1 6379 Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection timed out # telnet ::1 6379 Trying ::1... Connected to ::1. Escape character is '^]'.
When I try to connect to a service (Redis in that example) via IPv4 it times out, but if I try to connect via IPv6 it connects instantly. Is there some file related to IPv4 connectivity that could cause this type of behavior? Is there a way to fix this without having to reimage the server?
Update
After reading SYN's answer I have tried to connect to the same service (see above) but using my server's public IP instead (but still from inside the server) and it connects instantly. My understanding is that it works because it listens to 0.0.0.0 which accepts connections on any interface. But connecting from 127.0.0.1 still does not work and neither does connecting to a service that listens on 127.0.0.1 specifically. My conclusion then would be that there is indeed a problem with my loopback interface (on IPv4). Here is the output from ifconfig
:
# ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:7984 errors:0 dropped:0 overruns:0 frame:0
TX packets:7984 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:711801 (695.1 KiB) TX bytes:711801 (695.1 KiB)
venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:127.0.0.2 P-t-P:127.0.0.2 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:35812 errors:0 dropped:0 overruns:0 frame:0
TX packets:47530 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2568793 (2.4 MiB) TX bytes:34332070 (32.7 MiB)
venet0:0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:*public ip* P-t-P:*public ip* Bcast:*public ip* Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
Is there something from there that would explain the malfunction of the loopback interface? Is there another log or config file that I have overlooked that could explain or potentially fix the problems I'm having with this interface?
Update 2
A quick update to add that my server is a VPS under OpenVZ. From my (continuing) Google searches I have found that OpenVZ does networking a bit differently than other platforms so I am including that info here to potentially steer us in the right direction. From what I have seen though, nobody who's had a problem remotely similar to mine seem to have found the solution… (e.g. this post from Unix & Linux StackExchange).
Best Answer
I would bet you can connect redis on your IPv4. Unless redis listens on
127.0.0.1:6379
, you can't connect (nor telnet) to localhost.Not familiar enough with IPv6 to explain why it would work though.
Then again, I doubt nginx proxies traffic to redis. Can you show us your which virtualhost(s) is/are enabled? Is it normal your python processes listen on
0.0.0.0
? If so, you should probably enable back whatever firewall rules you've disabled.Update, reading OP's updates:
Nice to see you've found something. Meanwhile, my first remark regarding connecting to localhost was just plain wrong, apologies.