Java – How to Define the Bandwidth in Mean Shift Clustering

algorithmsclustercolorimage processingjava

I am making a program using Java todo colo quantization using mean shift clustering algorithm and the image is RGB with resolution of 512×512. I want to reduce the image file size by reducing the total color in the input image.

I have a problem with defining the bandwidth for calculating the euclidian squared distance in the mean shift algorithm.

How do you know the appropriate bandwidth for the data? Are there any formulation to define it?

Best Answer

The bandwidth is the distance/size scale of the kernel function, i.e. what the size of the “window” is across which you calculate the mean.

There is no bandwidth that works well for all purposes and all instances of the data. Instead, you will need to either

  • manually select an appropriate bandwith for your algorithm; or

  • use an algorithm that automatically adapts or estimates the bandwidth (though this implies some computational overhead).

    • The Python sklearn module offers an estimate_bandwith() function based on a nearest-neighbor analysis.
    • A wealth of research exists about this topic, e.g. Comaniciu, Ramesh, Meer (2001): The variable bandwidth mean shift and data-driven scale selection.

Any discussion of bandwidth or kernels first requires that you have already defined a distance metric. For image processing you will have to select a suitable colour space. RGB might not produce best results, depending on your goals.

Related Topic