Varnish Proxy – How to Serve from One Port and Clone Requests to Another Port

analyticsnginxPROXYvarnish

I have a problem to solve in my current deployment. The current one looks like this.

Varnish on port 80 in front of Nginx on port 8000 backed by uWSGI

The problem here is, The client want to implement some in house analytics which are required for business logic, implemented in Python served with uWSGI. Most of the varnish hits are gone uncounted (Hits are Anonymous). Two solutions came up are.

Hit a non cached server from clients (Here clients are Android devices. Two Requests per request are very costly in terms of battery usage
Proxy or clone the requests at varnish to another server where the analytics server can ingest the request. May be with UDP.

Is the 2nd solution relevant? Is it possible to do this? If yes, how can someone do this?

Best Answer

vmod_curl

It is possible to send an extra request to an external endpoint. I would advise you to do this through vmod_curl, a cURL module for Varnish.

See https://github.com/varnish/libvmod-curl for more information.

However, sending an extra call for every incoming request can be quite taxing on your analytics system. The reason why you're probably using Varnish is to avoid that your origin systems get overloaded.

Varnish's logging & statistics tools

And when it comes to analytics, the tools that Varnish provide are second to none.

Here are some references regarding Varnish's logging & statistics tools:

The logging (varnishlog, varnishtop & varnishncsa) tools are based on VSL, so here are some VSL references:

And finally, here's a reference to the counters that are used by varnishstat: http://varnish-cache.org/docs/6.0/reference/varnish-counters.html

Related Solutions

Let varnish send old data from cache while it’s fetching a new one

The solution that I've used to solve this problem is to make sure the TTL on a page never has a chance to expire before it's refreshed - forcing an HTTP client running on one of my systems to get the slow load instead of an unlucky client request.

In my case, this involves wget on a cron, sending a special header to mark the requests and setting req.hash_always_miss based on this, forcing a new copy of the content to be fetched into the cache.

acl purge {
    "localhost";
}

sub vcl_recv {
    /* other config here */
    if (req.http.X-Varnish-Nuke == "1" && client.ip ~ purge) {
        set req.hash_always_miss = true;
    }
    /* ... */
}

For your content, this might mean setting the Varnish TTL to something like 5 minutes but having a cron'd wget configured to make a cache-refreshing request every minute.

Varnish with MediaWiki not caching

Take a look at your request headers.

Cache-Control:no-cache

and

Cache-Control:max-age=0

Edit: These were not the cause of trouble and can be safely ignored. Varnish documentation stating:

Note By default, Varnish does not care about the Cache-Control request header. If you want to let users update the cache via a force refresh you need to do it yourself.

btw. it is a good idea to test these things using curl, that does not add any 'random' headers ...

Edit: Your trouble lies in hit-for-pass being saved for the request.

-   Debug          "XXXX HIT-FOR-PASS"
-   HitPass        2147516455

It is saved on previous request and tells varnish not to try any caching on following requests for the same resource.

Edit: Ok the solution (or rather the problem) is in this part of VCL:

    if (beresp.ttl < 48h) {
      set beresp.uncacheable = true;
      return (deliver);
    }

Which pretty much says, that if the TTL on response is shorter than 48 hours, set that response (or rather request) as uncacheable (=hit for pass). I can't think of a reason why this is in the sample configuration (maybe someone will help me). But I'd try commenting it out and see what happens.

It seems to be a mistake on the sample configuration, as varnish 3 sample contains this:

    if (beresp.ttl < 48h) {
      set beresp.ttl = 48h;
    }

Which just extends the ttl to 48 hours. (that is actually quite weird too, but maybe it works with mediawiki)

btw. I am assuming that you have left your default_ttl at default value which is 120 seconds. (that can be changed on the varnishd command line)