Why use deflate instead of gzip for text files served by Apache?
The simple answer is don't.
RFC 2616 defines deflate as:
deflate The "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951
The zlib format is defined in RFC 1950 as :
0 1
+---+---+
|CMF|FLG| (more-->)
+---+---+
0 1 2 3
+---+---+---+---+
| DICTID | (more-->)
+---+---+---+---+
+=====================+---+---+---+---+
|...compressed data...| ADLER32 |
+=====================+---+---+---+---+
So, a few headers and an ADLER32 checksum
RFC 2616 defines gzip as:
gzip An encoding format produced by the file compression program
"gzip" (GNU zip) as described in RFC 1952 [25]. This format is a
Lempel-Ziv coding (LZ77) with a 32 bit CRC.
RFC 1952 defines the compressed data as:
The format presently uses the DEFLATE method of compression but can be easily extended to use other compression methods.
CRC-32 is slower than ADLER32
Compared to a cyclic redundancy check of the same length, it trades reliability for speed (preferring the latter).
So ... we have 2 compression mechanisms that use the same algorithm for compression, but a different algorithm for headers and checksum.
Now, the underlying TCP packets are already pretty reliable, so the issue here is not Adler 32 vs CRC-32 that GZIP uses.
Turns out many browsers over the years implemented an incorrect deflate algorithm. Instead of expecting the zlib header in RFC 1950 they simply expected the compressed payload. Similarly various web servers made the same mistake.
So, over the years browsers started implementing a fuzzy logic deflate implementation, they try for zlib header and adler checksum, if that fails they try for payload.
The result of having complex logic like that is that it is often broken. Verve Studio have a user contributed test section that show how bad the situation is.
For example: deflate works in Safari 4.0 but is broken in Safari 5.1, it also always has issues on IE.
So, best thing to do is avoid deflate altogether, the minor speed boost (due to adler 32) is not worth the risk of broken payloads.
You can still use the zlib
module to inflate/deflate data. The gzip
module uses it internally, but adds a file-header to make it into a gzip-file. Looking at the gzip.py
file, something like this could work:
import zlib
def deflate(data, compresslevel=9):
compress = zlib.compressobj(
compresslevel, # level: 0-9
zlib.DEFLATED, # method: must be DEFLATED
-zlib.MAX_WBITS, # window size in bits:
# -15..-8: negate, suppress header
# 8..15: normal
# 16..30: subtract 16, gzip header
zlib.DEF_MEM_LEVEL, # mem level: 1..8/9
0 # strategy:
# 0 = Z_DEFAULT_STRATEGY
# 1 = Z_FILTERED
# 2 = Z_HUFFMAN_ONLY
# 3 = Z_RLE
# 4 = Z_FIXED
)
deflated = compress.compress(data)
deflated += compress.flush()
return deflated
def inflate(data):
decompress = zlib.decompressobj(
-zlib.MAX_WBITS # see above
)
inflated = decompress.decompress(data)
inflated += decompress.flush()
return inflated
I don't know if this corresponds exactly to whatever your server requires, but those two functions are able to round-trip any data I tried.
The parameters maps directly to what is passed to the zlib library functions.
Python ⇒ C
zlib.compressobj(...)
⇒ deflateInit(...)
compressobj.compress(...)
⇒ deflate(...)
zlib.decompressobj(...)
⇒ inflateInit(...)
decompressobj.decompress(...)
⇒ inflate(...)
The constructors create the structure and populate it with default values, and pass it along to the init-functions.
The compress
/decompress
methods update the structure and pass it to inflate
/deflate
.
Best Answer
UPDATE: Browsers have been dropping support for raw deflate. zOompf has completed some very thorough research on this very topic here. Unfortunately, it appears that raw deflate is NOT safe to use.
Check http://www.vervestudios.co/projects/compression-tests/results for more results.Here are the browsers that have been tested:* Android Sends HTTP request header "Accept-Encoding: gzip". Deflate is not permitted.
I conclude that we can always send raw DEFLATE (when the HTTP request header "Accept-Encoding" contains "deflate") and the browser will be able to correctly interpret the encoded data. Can someone prove this wrong?
note: .NET's native implementation of DEFLATE (System.IO.Compression.DeflateStream) is raw DEFLATE. It also sucks. Please use zlib.net for all of your .NET deflating needs.