Python – Save classifier to disk in scikit-learn

classificationmachine learningpythonscikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data?

I have the following sample program from the scikit-learn website:

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print "Number of mislabeled points : %d" % (iris.target != y_pred).sum()

Best Answer

Classifiers are just objects that can be pickled and dumped like any other. To continue your example:

import cPickle
# save the classifier
with open('my_dumped_classifier.pkl', 'wb') as fid:
    cPickle.dump(gnb, fid)    

# load it again
with open('my_dumped_classifier.pkl', 'rb') as fid:
    gnb_loaded = cPickle.load(fid)

Edit: if you are using a sklearn Pipeline in which you have custom transformers that cannot be serialized by pickle (nor by joblib), then using Neuraxle's custom ML Pipeline saving is a solution where you can define your own custom step savers on a per-step basis. The savers are called for each step if defined upon saving, and otherwise joblib is used as default for steps without a saver.

Related Topic