How DNS lookups work when using an HTTP proxy (or not) in IE

domain-name-systemhttpPROXY

I recently participated in a discussion regarding what happens when a client requests a page from a proxy server. I just wanted to make sure that my understanding of this sequence of events was correct in the general case:

  1. User requests site
  2. A DNS request is sent by the client, to its configured DNS server to resolve the destination IP address (this is done first in order to accommodate HTTP requests that are configured to bypass the proxy)
  3. Once the destination IP is received from DNS, and just before the HTTP request is sent, the request is checked against the exception list
  4. If the destination server is not on the exception list, the request is forwarded to the proxy server.
  5. If the destination server is on the exception list, the request is forwarded according to the client machine's routing table.

Any feedback would be most appreciated.

Best Answer

Not exactly: it depends on how the client is configured. Let's use IE as the basic example.

If you configure IE with an explicit proxy: e.g. no other options ticked, proxy set to something:8080.

  1. User types an address

  2. IE checks the address for a string match against the IE proxy exceptions list (i.e. "Bypass proxy for these addresses:")

    a. If it matches an entry in the Bypass list, the client uses its own DNS to resolve the name, and then the client connects directly to the target IP address on port 80 (assumed), then sends a request like:

    GET /something.htm HTTP/1.1
    Host: fulldomainame.example.com

    b. If no bypass list entries match, continue:

  3. IE connects to its configured proxy, and sends a request of the form:

    GET http://fulldomainname.example.com/something.htm HTTP/1.1

    Bonus factoid: this use of the FQDN in the URL is one way you can tell that a client thinks it's talking to a proxy instead of a real web server

  4. The proxy resolves that host name using its own DNS, and then connects to the target site (acts like the client in step 2 above), etc, etc.

When using WPAD/PAC:

In the case of using a Web Proxy Auto Discovery (WPAD) or Proxy Auto Configuration (PAC or Autoconfig) script, such as those provided by ISA/TMG when autoconfiguration is enabled, it's different:

  1. User types an address

  2. Client downloads the current wpad.dat/autoproxy.js/.pac file from its configured location

  3. Client looks for the function "FindProxyForUrl" in the js file, and executes it

  4. The Autoproxy script processes the hostname and URL. This is a limited-function javascript file, but lots of things are still possible:

    a. this may include name resolution (IsInNet, DnsResolve)

    b. this may include string matching (ShExpMatch)

    c. this may include counting to a million (i++)

    d. this may include narky alert popup messages if the admin's a jerk

    • (or just funny)
    • ((or debugging))
  5. The FindProxyForUrl function returns at least one string: an ordered list of the best proxies to use (semicolon separated)

    a. either "DIRECT", in which case the client then needs to resolve the name itself and connect directly, as per the Bypass case above

    b. or "PROXY proxyname:8080" or similar, in which case the client connects to that port on that proxy, tells it to GET the full URL, and the proxy performs name resolution.

    • As an example: if the script function returned "PROXY yourProxy:8080;DIRECT" that tells the client to connect to yourproxy on TCP port 8080 to request this URL, and if that connection can't be established, try going direct. Note that TCP session setup failure isn't exactly quick, so this isn't likely to be a pleasant failover experience for a user, but beats nothing. Maybe.

There are occasionally glitches, subtleties and unexplained behaviours, but for the most part when things aren't broken in weird and interesting ways, the above is how I've seen it work over many years. Newer browsers are optimizing behaviour, and parallelizing stuff, and trying interesting things all the time, so check out the most recent docs for your given browser to understand the fine detail.

WinSock Proxy / ISA Firewall Client / TMG Client:

If you're interested in the Winsock Proxy Client (from TMG/ISA Server), that's a different story, with more flexibility and moving parts. Too much to go into here, but there are docs around which describe how it works. In short: it plugs into Windows Sockets, and can intercept both TCP/UDP based traffic and name resolution requests on a per-app and per-user basis. Very powerful, but also deprecated now, and hasn't been updated in several years.

Clients can be Really Clingy:

One final note: Once an HTTP client has decided to talk to a proxy for a given site/url, there's no way for the proxy to tell it not to.

There is no HTTP status code or header for "I don't serve that, you should just go directly to it instead"...

Once the client decides a particular URL is proxy-served, proxy-death-grip ensues.

The only way to avoid it is by getting the selection logic right before the client makes its connection, in the PAC or Bypass list.

A final note on Zones and PAC files

IE treats sites which are DIRECT connected - even if they have dots in the URL - to be part of the Local Intranet Zone (by default - settable in Zone properties), and so will do things like allow Integrated Windows Authentication to those sites (i.e. Kerberos and/or NTLM authentication, transparently). So controlling whether something's in the Local Intranet Zone defines how trusted it is in terms of automatic authentication. Again, at least, by default.

Related Topic