C# – Fastest PNG decoder for .NET

cdecodedecodingperformancepng

Our web server needs to process many compositions of large images together before sending the results to web clients. This process is performance critical because the server can receive several thousands of requests per hour.

Right now our solution loads PNG files (around 1MB each) from the HD and sends them to the video card so the composition is done on the GPU. We first tried loading our images using the PNG decoder exposed by the XNA API. We saw the performance was not too good.

To understand if the problem was loading from the HD or the decoding of the PNG, we modified that by loading the file in a memory stream, and then sending that memory stream to the .NET PNG decoder. The difference of performance using XNA or using System.Windows.Media.Imaging.PngBitmapDecoder class is not significant. We roughly get the same levels of performance.

Our benchmarks show the following performance results:

  • Load images from disk: 37.76ms 1%
  • Decode PNGs: 2816.97ms 77%
  • Load images on Video Hardware: 196.67ms 5%
  • Composition: 87.80ms 2%
  • Get composition result from Video Hardware: 166.21ms 5%
  • Encode to PNG: 318.13ms 9%
  • Store to disk: 3.96ms 0%
  • Clean up: 53.00ms 1%

Total: 3680.50ms 100%

From these results we see that the slowest parts are when decoding the PNG.

So we are wondering if there wouldn't be a PNG decoder we could use that would allow us to reduce the PNG decoding time. We also considered keeping the images uncompressed on the hard disk, but then each image would be 10MB in size instead of 1MB and since there are several tens of thousands of these images stored on the hard disk, it is not possible to store them all without compression.

EDIT: More useful information:

  • The benchmark simulates loading 20 PNG images and compositing them together. This will roughly correspond to the kind of requests we will get in the production environment.
  • Each image used in the composition is 1600×1600 in size.
  • The solution will involve as many as 10 load balanced servers like the one we are discussing here. So extra software development effort could be worth the savings on the hardware costs.
  • Caching the decoded source images is something we are considering, but each composition will most likely be done with completely different source images, so cache misses will be high and performance gain, low.
  • The benchmarks were done with a crappy video card, so we can expect the PNG decoding to be even more of a performance bottleneck using a decent video card.

Best Answer

There is another option. And that is, you write your own GPU-based PNG decoder. You could use OpenCL to perform this operation fairly efficiently (and perform your composition using OpenGL which can share resources with OpenCL). It is also possible to interleave transfer and decoding for maximum throughput. If this is a route you can/want to pursue I can provide more information.

Here are some resources related to GPU-based DEFLATE (and INFLATE).

  1. Accelerating Lossless compression with GPUs
  2. gpu-block-compression using CUDA on Google code.
  3. Floating point data-compression at 75 Gb/s on a GPU - note that this doesn't use INFLATE/DEFLATE but a novel parallel compression/decompression scheme that is more GPU-friendly.

Hope this helps!