Electronic – RAM compression. Why hasn’t it caught on

memory

I was reading an journal on Hardware Compression of RAM and I was wondering why something like this hasn't become commonly used. Something like this would be really useful to servers or the kid at home who's rendering video for his Minecraft channel.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3232&rep=rep1&type=pdf

Best Answer

AFAIK, in general, lossless compression can never achieve a better rate than 2x (I think there was some proof about that which implied that for 'somewhat random' data there are diminishing returns. I'll try to find that). We have to consider the general case because the compression hardware is built in.

Lossy compression, like MPEG JPEG can do much better, but that is useless for computer programs and data. Compression becomes less effective the more 'random' the data, so, it could actually get to a stage where built-in compression offers no actual compression.

The shorter the compressed object, the more memory storage overhead we'd expect for the compression scheme, reducing its effectiveness. For example, to ensure random access still works, memory addresses would have to be mappable to compressed memory, which would likely waste space somewhere.

In return for compression, main memory latency will likely increase to allow the compression hardware to work. Main memory is already very, very much slower than the CPU, with significant latency. So adding more latency seems like a fundamentally 'bad idea'. The mismatch between CPU speed and memory speed has been getting worse (partly because of multiple cores sharing a memory bus) for many years. So a paper written in 2000 might not have fully considered this.

IBM's Power series CPU's can use compression on the instructions. They can decode to the normal instruction patterns before the CPU uses the instructions.

Edit3: It makes much more sense to compress read-only data than read-write data. A large block of read-only data can be compressed once, which is cheap. Instruction code is a very good candidate for compression because on most OSs, and many languages code is read-only or even execute-only (it can't be read as data by user programs). The case against this is some CPU Instruction Set Architectures (ISAs), are already densely encoded so it might not give as much benefit as it might appear.

Most modern programming languages include compression functions in their libraries. So a programmer can choose to compress chunks of data which might benefit. The programmer has enough knowledge to select between lossy and lossless compression which could yield bigger benefits than a blanket lossless technique. A programmer using compression would actually be bad for a built-in compression scheme because compressed data is very hard to compress (when using a good scheme). So a situation where lossy compression of specific data is 'good', and would yield much better compression than a lossless scheme would make compression worse!

Edit3:
OS X "Mavericks" Compressed Memory (page 8) explains that the OS X "Mavericks" does compress data for inactive applications.

Linux also has memory compression mechanisms in the kernel. That article explains that it uses pages as the unit of compression, so there are already in-kernel tables to track them, and hence existing structures can be extended. Further, the underlying CPU hardware will 'interrupt' on access to a page, which is exactly the right time to uncompress that page.

As pointed out in comments, IBM's AIX has Active memory Expansion too.

{ Comment: If someone can find the Windows, BSD, etc. versions which identify in-RAM compression, I will add those too. }

Just to be clear, I am not saying the referenced paper, using specific compression hardware is necessarily a bad idea.

I would expect doing memory compression using the CPU is relatively 'expensive'. It is likely reading memory sequentially, which is typically the definition of a 'cache buster'. It is imaginable that OSs switch off wholesale cache eviction for the memory being compressed, while doing compression. Hence doing work before the cache seems like it might be useful.

One model might be a 'smart' DMA controller which could do compression or decompression as it copies memory. That might be more obvious, easy to integrate into an existing OS, and be visible enough to the CPU and memory controller hardware that it might have minimal impact.

Edit:
It's irrelevant that a specific case have lossless compression which beats 2x. Specific cases do not matter. Write 20,000 0 bytes to a file, and run a lossless compression system, and the file will become tiny. That is not evidence that in the general case lossless compression always achieves better than 2x compression, merely that it might in some specific cases.

Instead, we need to demonstrate that lossless compression can achieve better than 2x in all cases, especially the worst cases, and random data cases. That is the point. Once the compression hardware is built in, there is no choice, unless the scheme becomes even even more complex. With even more complexity, I would expect wasted space, a cognate of memory fragmentation, to become worse.

Edit 2:
I have skimmed the paper. IMHO a significant weakness appears to be it doesn't deal with the cost of the electronics vs alternative uses of that spend. I would be surprised if significant cost applied to several areas of a CPU might not yield benefits. IMHO, it is not enough to say 'add this extra stuff, and this property of a system gets better'. It needs to be done in comparison with one or more alternatives. So, for example, it might be cheaper to just buy more memory!

My $0.02