Video Compression – Fast, Lossless Compression of a Video Stream

I have a video coming from a stationary camera. Both the resolution and the FPS are quite high. The data I get is in Bayer format and uses 10 bit per pixel. As there's no 10 bit data type on my platform, the original data is stored in memory using 16-bit words. I want to implement some kind of lossless compression of the data before transmitting it over a network.

The camera does not move, so big parts of consecutive frames are
nearly identical – but still not completely, due to the inevitable
noise (denoising is not an option, as it is supposed to be lossless
and shouldn't "lose" even the noise).
Because of high FPS, even the parts that change don't change much
between any two consecutive frames.
However, it looks like the camera also shakes a little. Very little, but still, even the stationary objects are not completely so in the image space.
The compression has to be done on the fly, so I can not gather a lot
of frames and compress them all together, but I can look 1 frame back
and use it as a reference.

Based on the above, my first thought was to bit-pack the data, so that those 6 redundant bits are not wasted on every word. However, I thought that if I use some entropy coding (e. g. Huffman etc.), that redundancy would be automatically taken into account, so no extra packing is necessary. So I've done the following:

Took binary difference between two consecutive frames. The original
data range was 0~1023 (e. g. unsigned 10 bits). Difference data
becomes signed and the range increases to -1023~1023, but the data
variation (or what's the correct mathematical term) becomes much less
than in the original data, in fact, most of the values are, not
surprisingly, close to zero.
Applied Rice coding to the difference. From what I understand, it
looks like a good choice for data sets of mostly small numerical
values.

This gives me about 60% reduction in size for 1280×720 frames, and my test system (Linux in VirtualBox on a single core) can do ~40 such compressions per second (without much optimization). Not that great, but reasonable, I guess (or is it?).

Are there better ways? Any common mistakes I made? Any general steps I missed? Higher resolution frames may be used later – should I expect better compression rates for bigger frame sizes?

UPD.:

I used this library for Rice encoding. The library is very slow (the author himself describes it as something for learning rather than for real use), for example it reads and writes bits one-by-one in loops, which kills performance. Initially it only gave me ~20 FPS, after some very basic optimization it became 40 FPS (as reported above), later I optimized it some more, it became 80. That is on a single i7 core without vectorization.
As for vectorization, though, unfortunately I couldn't think of a way to vectorize Rice code (don't even know if it is at all possible – could not find any data on Rice code, what I could find about Huffman code suggests that it's sequential and can't be efficiently vectorized, that may apply to Rice code as well as other variable-length codes).
I also tried a completely different approach: split the data into small pieces (e. g. like 64 pixel apiece) and use simple zero suppression. We find the biggest number in a block, write the number of bits required to represent it to the beginning of the block (4 additional bits were required for that, in my case), then reduce all numbers in the block to the same number of bits. I expected compression rate to be bad, but if the pieces are small, many of them won't have noise spikes, therefore their binary difference can be reduced to something like 4~6 bits per value, and it was, in fact, only about 5% worse than that of Rice code, while being about twice as fast (e. g. 160 FPS for my case). I tried vectorizing it, but I kinda suck at vectorization, so maybe because of that I could only achieve about x1.8 of further speed-up.

Because negative numbers don't have leading zeros, I applied zigzag encoding after binary difference and before Rice/zero suppression.

Best Answer

You've got temporal prediction, but no spatial. For better compression at the cost of speed, you should be able to use the pixels above and to the left of the current pixel in the current frame as predictors, as well as the pixel at the same location in the previous frame. The reason for only looking up and left is the same as the reason for only looking at the previous frame; you want to only rely on data you've already decoded, and limit how much of it you have to keep around.

Rice codes are probably a good tradeoff between efficiency and speed, but a static Huffman code (precomputed by you on a sample of video data) might be more efficient and equally fast.

As for speed, make sure that your code is getting vectorized — either by using the right compiler flags and code patterns to allow the compiler to auto-vectorize, or by hand-writing the code to use vector intrinsics or assembly.

Finally, is dropping down to 8 bits per pixel a possibility? Obviously that's leaving the realm of "lossless", but not only would it reduce the size of your compressed output, it would also, with vectorized code, possibly increase your throughput up to 2x.

Best Answer

Related Solutions

C# – Fast compression in C++ and decompression in C#

Algorithms – Best Two-Way Compression Algorithm for 32-Bit Numbers

Related Topic