Image hashing used for


I hear this term sometimes and am wondering what it is used for?

Best Answer

Hashing is a function that applies to an arbitrary data and produces the data of a fixed size (mostly a very small size). There are many different types of hashes, but if we are talking about image hashing, it is used either to:

  • find duplicates very fast. Almost any hash function will work. Instead of searching for the whole image, you will look for the hash of the image.
  • finding similar images, which I will explain later

Images that look identical to us, can be very different if you will just compare the raw bytes. This can be due to:

  • resizing
  • rotation
  • slightly different color gamma
  • different format
  • some minor noise, watermarks and artifacts

Even if you will find an image that will be different just in one byte, if you will apply a hash function to it, the result can be very different (for hashes like MD5, SHA it most probably will be completely different).

So you need a hash function which will create a similar (or even identical) hash for similar images. One of the generic ones is locality sensitive hashing. But we know what kind of problems can be with images, so we can come up with a more specialized kind of hash.

The most well known algorithms are:

  • a-hash. Average hashing is the simplest algorithm which uses only a few transformation. Scale the image, convert to greyscale, calculate the mean and binarize the greyscale based on the mean. Now convert the binary image into the integer. The algorithm is so simple that you can implement it in an hour.
  • p-hash. Perceptual hash uses similar approach but instead of averaging relies on discrete cosine transformation (popular transformation in signal processing).
  • d-hash. Difference hash uses the same approach as a-hash, but instead of using information about average values, it uses gradients (difference between adjacent pixels).
  • w-hash. Very similar to p-hash, but instead of DCT it uses wavelet transformation.

By the way, if you use python, all these hashes are already implemented in this library.