Python – Sequential k-means clustering using scikit-learn

cluster-analysismachine learningpythonscikit-learn

Is there a way to perform sequential k-means clustering using scikit-learn? I can't seem to find a proper way to add new data, without re-fitting all the data.

Thank you

Best Answer

scikit-learn's KMeans class has a predict method that, given some (new) points, determines which of the clusters these points would belong to. Calling this method does not change the cluster centroids.

If you do want the centroids to be changed by the addition of new data, i.e. you want to do clustering in an online setting, use the MiniBatchKMeans estimator and its partial_fit method.