Duplicate cache pages: Varnish

apache-2.2drupalvarnish

Recently we have configured Varnish on our server, it was successfully setup but we noticed that if we open any page in multiple browsers, the Varnish send request to Apache not matter page is cached or not. If we refresh twice on each browser it creates duplicate copies of the same page.

What exactly should happen:

If any page is cached by Varnish, the subsequent request should be served from Varnish itself when we are opening the same page in browser OR we are opening that page from different IP address.

Following is my default.vcl file

backend default {
    .host = "127.0.0.1";
    .port = "80";
}

sub vcl_recv {
    if( req.url ~ "^/search/.*$")
    {
    }else {
        set req.url = regsub(req.url, "\?.*", "");
}

if (req.restarts == 0) {
    if (req.http.x-forwarded-for) {
        set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
    } else {
        set req.http.X-Forwarded-For = client.ip;
    }
}

if (!req.backend.healthy) {
    unset req.http.Cookie;
}

set req.grace = 6h;

if (req.url ~ "^/status\.php$" ||
        req.url ~ "^/update\.php$" ||
        req.url ~ "^/admin$" ||
        req.url ~ "^/admin/.*$" ||
        req.url ~ "^/flag/.*$" ||
        req.url ~ "^.*/ajax/.*$" ||
        req.url ~ "^.*/ahah/.*$") {
            return (pass);
}

if (req.url ~ "(?i)\.(pdf|asc|dat|txt|doc|xls|ppt|tgz|csv|png|gif|jpeg|jpg|ico|swf|css|js)(\?.*)?$") {
    unset req.http.Cookie;
}

if (req.http.Cookie) {
    set req.http.Cookie = ";" + req.http.Cookie;
    set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");    
    set req.http.Cookie = regsuball(req.http.Cookie, ";(SESS[a-z0-9]+|SSESS[a-z0-9]+|NO_CACHE)=", "; \1=");
    set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
    set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");

        if (req.http.Cookie == "") {
            unset req.http.Cookie;
        }
        else {
            return (pass);
        }
}

if (req.request != "GET" && req.request != "HEAD" &&
    req.request != "PUT" && req.request != "POST" &&
    req.request != "TRACE" && req.request != "OPTIONS" &&
    req.request != "DELETE") 
    {return(pipe);}     /* Non-RFC2616 or CONNECT which is weird. */

if (req.request != "GET" && req.request != "HEAD") {
    return (pass);
}

if (req.http.Accept-Encoding) {
    if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
        # No point in compressing these
        remove req.http.Accept-Encoding;
    } else if (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
    } else if (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
    } else {
        # unknown algorithm
        remove req.http.Accept-Encoding;
    }
}
    return (lookup);
}

sub vcl_deliver {
    if (obj.hits > 0) {
        set resp.http.X-Varnish-Cache = "HIT";
    }
    else {
        set resp.http.X-Varnish-Cache = "MISS";
    }
}

sub vcl_fetch {
    if (beresp.status == 404 || beresp.status == 301 || beresp.status == 500) {
        set beresp.ttl = 10m;
}
if (req.url ~ "(?i)\.(pdf|asc|dat|txt|doc|xls|ppt|tgz|csv|png|gif|jpeg|jpg|ico|swf|css|js)(\?.*)?$") {
    unset beresp.http.set-cookie;
}
    set beresp.grace = 6h;
}

sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    return (hash);
}

sub vcl_pipe {
    set req.http.connection = "close";
}

sub vcl_hit {
    if (req.request == "PURGE") 
        {ban_url(req.url);
    error 200 "Purged";}

    if (!obj.ttl > 0s)
        {return(pass);}
}

sub vcl_miss {
    if (req.request == "PURGE") 
        {error 200 "Not in cache";}
}

Solution

Pitfall – Vary: User-Agent

Some applications or application servers send Vary: User-Agent along with their content. This instructs Varnish to cache a separate copy for every variation of User-Agent there is. There are plenty. Even a single patchlevel of the same browser will generate at least 10 different User-Agent headers based just on what operating system they are running.

So if you really need to Vary based on User-Agent be sure to normalize the header or your hit rate will suffer badly. Use the above code as a template.

https://www.varnish-cache.org/docs/3.0/tutorial/vary.html#tutorial-vary

Workaround

One workaround, is to do what we call "User-Agent-Washing", where
Varnish rewrites the Useragent to the handfull of different variants
your backend really cares about, along the lines of:

sub vcl_recv {
       if (req.http.user-agent ~ "MSIE") {
           set req.http.user-agent = "MSIE";
   } else {
           set req.http.user-agent = "Mozilla";
   }
}

Best Answer

First thing is that it's impossible for varnish to cache 2 copies of a URL.

Now, I am not sure about the Hit/Miss check, but when I need to check, I will do that in the firefox and use Firebug for that.

I will open the firebug and open the website.

In that, it will show the age of the every page/image fetched, like shown in the picture attached.

If age is increasing by time, then for me Varnish is working pretty well.

And what I can see, it's working fine for your site too.

How to check Varnish Cache working

Related Topic