Http gzip compression and performance

compressioncpu-usagegzipiis-6performance

I've always tried to enable gzip compression on web servers because it seemed to have very low CPU cost and you obtain a significant data transfer reduction.

Now I we have a public server that has not gzip enabled and sometimes it's CPU load is quite high under heavy traffic (mostly because complex SQL queries on certain pages) and reading this Microsoft article on the subject of enabling, CPU load should be taken in account when enabling gzip.

The client wants to reduce bandwidth and speedup page load times but I'm not sure that enabling gzip will do more harm than good, although it has worked well on other servers.

In your experience, will gzip compression have a significant impact on CPU load?

EDIT: In this case we are using IIS6

Best Answer

The key issue is "how much data is being compressed?".

If you are running a massive DB query that takes a noticeable number of seconds to run, and the resulting page is a few tens of Kb long, then the expense of compressing the data will be completely dwarfed by the expense of the SQL work to the point where there is no point even thinking about it. A modern CPU is going to compress tens or hundreds of Kb pretty much instantly compared to any chunky DB query.

Another factor in favour of compression is that, correctly configured, static pages are not re-compressed on every request and objects that won't benefit (image files and others that are pre-compressed) are not compressed by the web server at all. Only dynamic likely-to-be-compressible content need be gzipped on each request.

Generally speaking, unless you have specific reason not to compress, I recommend doing so. The CPU cost is generally small unless you are running the web server on a low-power device (like a domestic router for instance) for some reason. One reason not to compress is scripts that use "long poll" techniques to emulate server push efficiently or scripts that drip-feed content to the browser for progress indication - the buffering implied by dynamic compression can cause such requests to time-out on the client side, but with careful configuration you can add them to the "don't compress" list while still compressing everything else. Another reason to consider not using dynamic compressions is that it does add a little latency to each dynamic request, though for most web applications this difference is completely negligible compared to the bandwidth savings.

A side note on CPU load due to SQL queries: this implies that your working data-set for these queries is small enough to fit in RAM (otherwise your performance would be I/O bound rather than CPU bound), which is a GoodThing(tm). The high CPU load could just be due to the shear number of concurrent queries as you suspect, but it could also be that some of them are table-scanning objects that are in SQL's allocated RAM and/or the OS's cache (or they are otherwise doing their work the long way around) so it might be worth logging long running queries and checking to see if there are any indexing improvements or other optimisations you can use to reduce the working set they operate over.