Why use deflate instead of gzip for text files served by Apache?
The simple answer is don't.
RFC 2616 defines deflate as:
deflate The "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951
The zlib format is defined in RFC 1950 as :
0 1
+---+---+
|CMF|FLG| (more-->)
+---+---+
0 1 2 3
+---+---+---+---+
| DICTID | (more-->)
+---+---+---+---+
+=====================+---+---+---+---+
|...compressed data...| ADLER32 |
+=====================+---+---+---+---+
So, a few headers and an ADLER32 checksum
RFC 2616 defines gzip as:
gzip An encoding format produced by the file compression program
"gzip" (GNU zip) as described in RFC 1952 [25]. This format is a
Lempel-Ziv coding (LZ77) with a 32 bit CRC.
RFC 1952 defines the compressed data as:
The format presently uses the DEFLATE method of compression but can be easily extended to use other compression methods.
CRC-32 is slower than ADLER32
Compared to a cyclic redundancy check of the same length, it trades reliability for speed (preferring the latter).
So ... we have 2 compression mechanisms that use the same algorithm for compression, but a different algorithm for headers and checksum.
Now, the underlying TCP packets are already pretty reliable, so the issue here is not Adler 32 vs CRC-32 that GZIP uses.
Turns out many browsers over the years implemented an incorrect deflate algorithm. Instead of expecting the zlib header in RFC 1950 they simply expected the compressed payload. Similarly various web servers made the same mistake.
So, over the years browsers started implementing a fuzzy logic deflate implementation, they try for zlib header and adler checksum, if that fails they try for payload.
The result of having complex logic like that is that it is often broken. Verve Studio have a user contributed test section that show how bad the situation is.
For example: deflate works in Safari 4.0 but is broken in Safari 5.1, it also always has issues on IE.
So, best thing to do is avoid deflate altogether, the minor speed boost (due to adler 32) is not worth the risk of broken payloads.
There is some confusion about the naming between the specifications and the HTTP:
- DEFLATE as defined by RFC 1951 is a compressed data format.
- ZLIB as defined by RFC 1950 is a compressed data format that uses the DEFLATE data format.
- GZIP as defined by RFC 1952 is a file format that uses the DEFLATE compressed data format.
But the HTTP uses a different naming:
gzip
An encoding format produced by the file compression program "gzip" (GNU zip) as described in RFC 1952 [25]. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC.
deflate
The "zlib" format defined in RFC 1950 [31] in combination with the "deflate" compression mechanism described in RFC 1951 [29].
So to sum up:
gzip
is the GZIP file format.
deflate
is actually the ZLIB data format. (But some clients do also accept the actual DEFLATE data format for deflate
.)
See also this answer on the question What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?:
What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?
"gzip" is the gzip format, and "deflate" is the zlib format. They should probably have called the second one "zlib" instead to avoid confusion with the raw deflate compressed data format. While the HTTP 1.1 RFC 2616 correctly points to the zlib specification in RFC 1950 for the "deflate" transfer encoding, there have been reports of servers and browsers that incorrectly produce or expect raw deflate data per the deflate specficiation in RFC 1951, most notably Microsoft. So even though the "deflate" transfer encoding using the zlib format would be the more efficient approach (and in fact exactly what the zlib format was designed for), using the "gzip" transfer encoding is probably more reliable due to an unfortunate choice of name on the part of the HTTP 1.1 authors.
Best Answer
It is apparently due to a misunderstanding resulting from the choice of the name "Deflate". The http standard clearly states that "deflate" really means the zlib format:
However early Microsoft servers would incorrectly deliver raw deflate for "Deflate" (i.e. just RFC 1951 data without the zlib RFC 1950 wrapper). This caused problems, browsers had to try it both ways, and in the end it was simply more reliable to only use gzip.
The impact in bandwidth and execution time to use gzip instead of "Deflate" (zlib), is relatively small. So there we are and there it is likely to remain.
The difference is 12 more bytes for gzip and slightly more CPU time to calculate a CRC instead of an Adler-32.