Nginx – Why cache static files with Varnish, why not pass

configurationnginxvarnishweb-server

I have a system runnning nginx / php-fpm / varnish / wordpress and amazon s3.

Now I have looked at a lot of configuration files while setting up the system, and in all of them I found something like this:

    /* If the request is for pictures, javascript, css, etc */
    if (req.url ~ "\.(jpg|jpeg|png|gif|css|js)$") {
        /* Remove the cookie and make the request static */
        unset req.http.cookie;
        return (lookup);
    }

I do not understand why this is done. Most of the examples also run NginX as a webserver. Now the question is, why would you use the varnish cache to cache these static files.

It makes much more sense to me to only cache the dynamic files so that php-fpm / mysql don't get hit that much.

Am I correct or am I missing something here?

UPDATE

I want to add some info to the question based on the answer given.

If you have a dynamic website, where the content actually changes a lot, chaching does not make sense. But if you use WordPress for a static website for example, this can be cached for long periods of time.

That said, more important to me is static conent. I have found a link with some test and benchmarks on different cache apps and webserver apps.

http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-varnish-vs-gwan/

NginX is actually faster in getting your static content, so it makes more sense to just let it pass. NginX works great with static files.

Apart from that, most of the time static content is not even in the webserver itself. Most of the time this content is stores on a CDN somewhere, maybe AWS S3, something like that. I think the varnish cache is the last place where you want to have you static content stored.

Best Answer

There are a few advantages to Varnish. The first one you note is reducing load on a backend server. Typically by caching content that is generated dynamically but changes rarely (compared to how frequently it is accessed). Taking your Wordpress example, most pages presumably do not change very often, and there are some plugins that exist to invalidate a varnish cache when the page changes (i.e. new post, edit, comment, etc). Therefore, you cache indefinitely, and invalidate on change - which results in the minimum load to your backend server.

The linked article not-withstanding, most people would suggest that Varnish performs better than Nginx if setup properly - although, (and I really hate to admit it) - my own tests seem to concur that nginx can serve a static file faster than varnish (luckily, I don't use varnish for that purpose). I think that the problem is that if you end up using Varnish, you have added an extra layer to your setup. Passing through that extra layer to the backend server will always be slower than just serving directly from the backend - and this is why allowing Varnish to cache may be faster - you save a step. The other advantage is on the disk-io front. If you setup varnish to use malloc, you don't hit the disk at all, which leaves it available for other processes (and would usually speed things up).

I think that one would need a better benchmark to really gauge the performance. Repeatedly requesting the same, single file, triggers file system caches which begin to shift the focus away from the web-servers themselves. A better benchmark would use siege with a few thousand random static files (possibly even from your server logs) to simulate realistic traffic. Arguably though, as you mentioned, it has become increasingly common to offload static content to a CDN, which means that Varnish probably won't be serving it to begin with (you mention S3).

In a real-world scenario, you would likely prioritize your memory usage - dynamic content first, as it is the most expensive to generate; then small static content (e.g. js/css), and lastly images - you probably wouldn't cache other media in memory, unless you have a really good reason to do so. In this case, with Varnish loading files from memory, and nginx loading them from disk, Varnish will likely out-perform nginx (note that nginx's caches are only for proxying and fastCGI, and those, by default are disk based - although, it is possible to use nginx with memcached).

(My quick - very rough, not to be given any credibility - test showed nginx (direct) was the fastest - let's call it 100%, varnish (with malloc) was a bit slower (about 150%), and nginx behind varnish (with pass) was the slowest (around 250%). That speaks for itself - all or nothing - adding the extra time (and processing) to communicate with the backend, simply suggests that if you are using Varnish, and have the RAM to spare, you might as well just cache everything you can and serve it from Varnish instead of passing back to nginx.)