Image Processing – How to Implement K-Means Algorithm on RGB Images

algorithmscolorimage processing

I want to apply the k-means algorithm on an RGB image data-set. I don't know how to proceed. You can mention basic step by step flow to proceed. I do know basics of Image Processing & OpenCV.

Best Answer

You can readily apply the k-means algorithm to the RGB image data set. An image data set is in no way special, except that each data vector is three dimensional (R, G and B) and the values are bounded integers in the [0, 255] range.

The standard k-means algorithm just needs to compute the distance between two as well as the mean of several data points.

For more information on the k-means algorithm, see for example here.

Naive RGB color distance: If you have two elements i and j with RGB values (ri, gi, bi) and (rj, gj, bj), respectively, then the distance d between image points i and j equals:

d = sqrt((ri-rj)^2+(gi-gj)^2+(bi-bj)^2)

And the mean of several colors points would then be the mean of the RGB values separately.

For an implementation of this in OpenCV see example for color quantization on the OpenCV tutorial.

More realistic color distance: The RGB color space is not a good model for color distance unfortunately (see excellent comment by John Forkosh). However, the CIE-L*ab colorspace and some associated color distance model seem to be suited better.

Step 1: Conversion of RGB to CIE-L*ab (RGB is device dependent, so it may need to be transformed before to some absolute colorspace RGB values).

Very probably the platform you are using already has an RGB to CIE-L*ab conversion (Matlab, Java, Python or C++ using OpenCV). Otherwise you would have to write your own (Help).

Step 2: Calculate the distance in the k-means algorithm.

Delta E* from there seems simple to calculate:

(l1,a1,b1) are the Lab values of data point 1 and (l2,a2,b2) are the Lab values of data point 2

d = sqrt((l1 - l2)^2 + (a1 - a2)^2 + (b1 - b2)^2)

The mean of several colors in the CIE-Lab space would again be the mean of each Lab component separately.

But you also may want to try others distances and see what works best.

Summary: The k-means algorithm doesn't need to be modified except for the distance and the mean calculation. For a better suited distance, the conversion to another color space (CIE-L*ab) and the computation of a distance in that color space are recommended.

Related Topic