High Availability Squid proxy using ELB

amazon ec2amazon-elbamazon-web-servicesPROXYsquid

I am currently trying to setup a HA squid proxy which consist of an ASG, ELB and EC2 instances. I have set the proxy server settings on LAN settings on Internet Explorer to the ELB DNS name. When trying to load a webpage from the allowed url list, I get the following error message with an instance configured to use the proxy:

ERROR

The requested URL could not be retrieved

The following error was encountered while trying to retrieve the URL: /

Invalid URL

Some aspect of the requested URL is incorrect.

Some possible problems are:
•Missing or incorrect access protocol (should be “http://” or similar)
•Missing hostname
•Illegal double-escape in the URL-Path
•Illegal character in hostname; underscores are not allowed.
Your cache administrator is root.

The issue appears to be with the load balancer, as soon as you set the internet explorer proxy settings to the proxy private dns or private ip address then there is no issue and the proxy works fine, i.e the webpage loads as it should.

Here is the squid.conf

            # This file generated from a Chef template.
            # squid/templates/default.squid.conf.erb
            acl manager proto cache_object
            acl localhost src 127.0.0.1/32
            acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
            acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
            acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
            acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
            acl localnet src fc00::/7       # RFC4193 local private network range
            acl localnet src fe80::/10      # RFC4291 link-local (directly-plugged) machine
            acl SSL_ports port 443      # https
            acl SSL_ports port 563      # snews
            acl SSL_ports port 873      # rsync
            acl Safe_ports port 80      # http
            acl Safe_ports port 21      # ftp
            acl Safe_ports port 443     # https
            acl Safe_ports port 70      # gopher
            acl Safe_ports port 210     # wais
            acl Safe_ports port 1025-65535  # unregistered ports
            acl Safe_ports port 280     # http-mgmt
            acl Safe_ports port 488     # gss-http
            acl Safe_ports port 591     # filemaker
            acl Safe_ports port 777     # multiling http
            acl Safe_ports port 631     # cups
            acl Safe_ports port 873     # rsync
            acl Safe_ports port 901     # SWAT
            acl purge method PURGE
            acl CONNECT method CONNECT
            http_access allow all
            # Managed with Chef
            acl web-hosts src all
            acl web-bd dstdomain .amazonaws.com
            acl web-bd dstdomain .chef.io
            acl web-bd dstdomain .rubygems.org
            acl web-bd dstdomain .splunk.com
            acl web-bd dstdomain .bintray.com
            acl web-bd dstdomain .trendmicro.com
            acl web-bd dstdomain .slproweb.com
            acl web-bd dstdomain .fastly.net
            http_access allow web-bd
            # The line below blocks all websites which are not allowed in the squid_urls data bag
            http_access deny !web-bd
            http_access allow manager localhost
            http_access deny manager
            http_access allow purge localhost
            http_access deny purge
            http_access deny !Safe_ports
            http_access deny CONNECT !SSL_ports
            http_access allow localhost
            http_access allow localnet
            http_access deny all
            icp_access allow localnet
            icp_access deny all
            http_port 3130 protocol=HTTP
            hierarchy_stoplist cgi-bin ?
            access_log /var/log/squid/access.log squid
            refresh_pattern     ^ftp:               1440    20%     10080
            refresh_pattern     ^gopher:            1440    0%      1440
            refresh_pattern     -i (/cgi-bin/|\?)       0       0%      0
            refresh_pattern     (Release|Package(.gz)*)$    0       20%     2880
            # refresh_pattern           \.deb$              1440    20%     10080
            # refresh_pattern           \.rpm$              1440    20%     10080
            # refresh_pattern           \.iso$              1440    20%     10080
            # refresh_pattern           \.$         1440    20%     10080
            # refresh_pattern           .               0       20%     4320
            hosts_file /etc/hosts
            maximum_object_size 1024 MB
            coredump_dir /var/spool/squid
            cache_mem 0 MB
            debug_options ALL

I turned ELB logging on and here is some traffic:

            2016-09-02T10:58:59.552990Z ha-proxy 10.166.107.198:56190 10.166.106.20:3130 0.000036 0.000908 0.000026 400 400 0 3154 "GET http://youtube.com:3130/ HTTP/1.1" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" - -
            2016-09-02T10:58:59.572031Z ha-proxy 10.166.107.198:56190 10.166.106.20:3130 0.000023 0.000689 0.000018 400 400 0 3182 "GET http://www.squid-cache.org:3130/Artwork/SN.png HTTP/1.1" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" - -
            2016-09-02T10:59:01.844569Z ha-proxy 10.166.107.198:56190 10.166.106.20:3130 0.000046 0.001085 0.000021 400 400 0 3154 "GET http://google.com:3130/ HTTP/1.1" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" - -
            2016-09-02T10:59:01.862584Z ha-proxy 10.166.107.198:56190 10.166.106.20:3130 0.00004 0.000839 0.000023 400 400 0 3182 "GET http://www.squid-cache.org:3130/Artwork/SN.png HTTP/1.1" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" - -

From the squid access log:

            1472813779.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813793.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813821.915      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813828.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813835.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813849.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813856.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813863.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813870.900      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813884.876      0 10.166.106.29 NONE/400 4076 NONE error:invalid-request - NONE/- text/html
            1472813898.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813939.552      0 10.166.106.29 NONE/400 3533 GET / - NONE/- text/html
            1472813939.570      0 10.166.106.29 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472813940.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472813941.843      0 10.166.106.29 NONE/400 3533 GET / - NONE/- text/html
            1472813941.861      0 10.166.106.29 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472813961.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817013.903      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817020.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817026.226      0 10.166.106.80 NONE/400 4076 NONE error:invalid-request - NONE/- text/html
            1472817026.332      0 10.166.106.80 NONE/400 4076 NONE error:invalid-request - NONE/- text/html
            1472817034.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817048.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817083.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817097.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817104.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817153.904      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817160.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817174.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817195.920      0 10.166.106.29 NONE/400 4076 NONE error:invalid-request - NONE/- text/html
            1472817202.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817223.898      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817237.900      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817251.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817265.899      0 10.166.106.80 NONE/400 4075 NONE error:invalid-request - NONE/- text/html
            1472817275.160      0 10.166.106.80 NONE/400 3533 GET / - NONE/- text/html
            1472817275.193      0 10.166.106.80 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472817278.626      0 10.166.106.29 NONE/400 3677 GET /fwlink/?LinkID=403856&language=en-US&scale=100&contrast=gray - NONE/- text/html
            1472817330.367      0 10.166.106.80 NONE/400 3533 GET / - NONE/- text/html
            1472817330.397      0 10.166.106.80 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472817330.448      0 10.166.106.80 NONE/400 3555 GET /favicon.ico - NONE/- text/html
            1472817330.453      0 10.166.106.80 NONE/400 3555 GET /favicon.ico - NONE/- text/html
            1472817337.300      0 10.166.106.29 NONE/400 3533 GET / - NONE/- text/html
            1472817337.334      0 10.166.106.29 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472818532.464      0 10.166.106.29 NONE/400 3533 GET / - NONE/- text/html
            1472818532.478      0 10.166.106.29 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472818533.259      0 10.166.106.29 NONE/400 3533 GET / - NONE/- text/html
            1472818533.278      0 10.166.106.29 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html
            1472818534.108      0 10.166.106.29 NONE/400 3533 GET / - NONE/- text/html
            1472818534.137      0 10.166.106.29 NONE/400 3561 GET /Artwork/SN.png - NONE/- text/html

Anyone got any ideas? I am pulling my hair out.

Best Answer

You are correct - This is an ELB related issue.

tl;dr - You can fix this issue by switching from HTTP to TCP to communicate with your backend squid server.

When clients send HTTP requests to a proxy, the request contains an absolute URI (e.g. GET http ://host-url HTTP/1.1).
The AWS ELB, however, enforces the most common form of HTTP requests and rewrites the request (GET / HTTP/1.1 HOST www.host-url.com).
This rewrite makes it unreadable by Squid. Therefore, it is currently impossible to use ELB HTTP listeners and the "X-Forwarded-For" HTTP header to retrieve the client IP address.

In order to solve this issue, I've switched to the TCP protocol, since with this protocol, the ELB passes the request as-is without interpolating the request headers.
The one major issue with this, is that you don't get the regular ELB headers (such as x-forwarded-for), and you won't be able to log the client's IP address (only the ELB's).

There are a number of solutions to this issue:
1. configure your ELB as with a proxy pass protocol with a TCP protocol.
you can read more about it here.
2. use squid version 3.5 and above. This version supports the proxy pass protocol and should parse the URL from the ELB. read more about it here.

Related Topic