Firewall – Cannot use FTP passive mode after recent Windows Azure VM maintenance

azurefirewallftpopensuse

Recently, Windows Azure had a scheduled maintenance on Virtual Machine services.

I had my machine not booting anymore so I recreated it fresh new from my disk image, which is supposed to work fine having worked fine so far.

Running passive FTP on Windows Azure

I run FTP services on my virtual server with vsftpd. Both active and passive. For passive FTP I chose ports 25003-25014 as range. I have set them in my vsftpd.conf file and I have mapped all endpoints in Azure control panel.

My vsftpd.conf:

write_enable=YES
dirmessage_enable=YES
nopriv_user=ftpsecure
local_enable=YES
anonymous_enable=NO
anon_world_readable_only=YES
syslog_enable=NO
xferlog_enable=YES
vsftpd_log_file=/var/log/vsftpd.log
xferlog_std_format=YES
xferlog_file=/var/log/vsftpd.log
connect_from_port_20=YES
ascii_upload_enable=YES
pam_service_name=vsftpd
ssl_enable=NO
pasv_min_port=25003
pasv_max_port=25014
anon_mkdir_write_enable=NO
anon_root=/srv/ftp
anon_upload_enable=NO
chroot_local_user=YES
ftpd_banner=WELCOME
idle_session_timeout=900
listen=YES
log_ftp_protocol=YES
max_clients=30
max_per_ip=8
pasv_enable=YES
ssl_sslv2=NO
ssl_sslv3=NO
ssl_tlsv1=YES
pasv_addr_resolve=YES
pasv_address=<myhost>.cloudapp.net

Port mappings (port 21 is on top of the listing, not shown in the screenshot)

Screenshot of Azure port mappings

The problem

When a client connects to FTP, it tries to enter Passive Mode but doesn't succeed. Further analyses conducted using tcpdump. Several analyses

Xftp client reports the following activity log:

STATUS:>    Session started...
STATUS:>    Resolving the host 'XXXXXXXXXXXX'...
STATUS:>    Connecting to the server 'XXXXXXXXXXX'...
        220 WELCOME
STATUS:>    Authenticating for 'YYYYYYYY'...
COMMAND:>   USER YYYYYYYYY
        331 Please specify the password.
COMMAND:>   PASS ****
        230 Login successful.
COMMAND:>   PWD
        257 "/"
STATUS:>    Listing folder '/'...
COMMAND:>   CWD /
        250 Directory successfully changed.
COMMAND:>   PWD
        257 "/"
COMMAND:>   TYPE A
        200 Switching to ASCII mode.
COMMAND:>   PASV
        227 Entering Passive Mode (XXX,XXX,XXX,XXX,97,174).

97,174 is supposed to be 97*256+174=25006. Then I got timeout.

# netstat -oanp | grep vsftpd
tcp        1      0 100.89.XXX.X:25013      0.0.0.0:*               LISTEN      26084/vsftpd        off (0.00/0/0)
tcp        0      0 0.0.0.0:21              0.0.0.0:*               LISTEN      25500/vsftpd        off (0.00/0/0)
tcp        0      0 100.89.XXX.X:21         100.89.XXX.YYY:57307    ESTABLISHED 26155/vsftpd        keepalive (7212,10/0/0)
tcp        0      0 100.89.XXX.X:21         MY.IP.ADDR.!!:57255       ESTABLISHED 26084/vsftpd        keepalive (7081,01/0/0)

I ran tcpdump on the server and discovered two things:

  • 100.89.XXX.YYY, which is not part of any clusters of mine (it's not a cloud service I own but is in the same subnet as the virtual machine), gets lots of RST packets. But who the hell told that machine to connect to my FTP?
  • SYN packets from my IP never reach the server

I also noticed another interesting thing. When I start vsftpd from SSH console, it takes about a minute to go live. Actually the process starts and is listed in netstat, but it takes a while for it to accept incoming connections from my client.

I tried to check if firewall is disabled. I ran yast firewall but discovered that another firewall is active on the machine: I configured none of them!

The strangely working-workaround

By reducing the range of PASV ports to only one, I discovered that after a few attempts it eventually connects to the PASV port and displays directory listing

The question

How do I make vsftpd work again as expected? Please mind that I never changed configuration.

Possible related issue (not confirmed)

The analysis above suggests me that there is a considerable delay between the accept() syscall done by the FTP server and the concrete possibility for Azure to route TCP SYN packets from public IP/port to private IP/port, thus causing timeout.

So I tried to run an instance of Webmin and immediately tried to connect with my browser: it took a dozen of seconds to the daemon to start, but after started it looked like responding immediately, so that doesn't necessarily seem to be the cause

Best Answer

I had a very similar issue recently that I was able to solve using the answer to this forum post (Credit goes to Craig Landis for the solution)

http://social.msdn.microsoft.com/Forums/windowsazure/en-US/8f697f17-72b7-46f7-8c97-398b91190a2f/server-2012-vm-on-azure-passive-ftp-wont-work.

some background text from the article:

We believe this may have to do with a recent change to how the portal creates endpoints. Now by default it configures a probe port on the endpoint where the probe port is the same as the endpoint port. The load balancer sends packets to the probe port to determine the health of the endpoint and if it does not get a response after a few retries, it will stop forwarding traffic to the endpoint port.

Example scenario:

Port 21 is open to all in Windows Firewall in the VM, so probe is successful, the endpoint is healthy and remote IPs can connect to it.

Port 60005 (for example) is likely only open in the Windows Firewall in the VM to those remote IPs that negotiated the passive mode ftp. It is not open to the load balancer so the load balancer is unable to probe this port. As a result, the endpoint as unhealthy and stops sending traffic to the endpoint port.

The 10.x.x.x address you see in the VM is the host server's IP address that the load balancer uses as the source IP to probe the port.

Workaround:

Remove the endpoints and then create them with Azure PowerShell using Add-AzureEndpoint, specifying only the name, protocol, localport and publicport parameters. This will create the endpoint without a probe port (which was the portal behavior until recently).

There are additional instructions in the post for how to add the endpoints using powershell if you aren't aware of how to do so. Additionally this resource helped me get up and running in powershell: http://blogs.msdn.com/b/windows_azure_technical_support_wats_team/archive/2013/02/18/windows-azure-powershell-getting-started.aspx

Hope this helps,

Yabbi