Here's a generator that yields the chunks you want:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
If you're using Python 2, you should use xrange()
instead of range()
:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]
Also you can simply use list comprehension instead of writing a function, though it's a good idea to encapsulate operations like this in named functions so that your code is easier to understand. Python 3:
[lst[i:i + n] for i in range(0, len(lst), n)]
Python 2 version:
[lst[i:i + n] for i in xrange(0, len(lst), n)]
Best Answer
scikit-learn's
KMeans
class has apredict
method that, given some (new) points, determines which of the clusters these points would belong to. Calling this method does not change the cluster centroids.If you do want the centroids to be changed by the addition of new data, i.e. you want to do clustering in an online setting, use the
MiniBatchKMeans
estimator and itspartial_fit
method.