Python – How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn

cross-validationmachine learningplotpythonscikit-learn

I'm running some supervised experiments for a binary prediction problem. I'm using 10-fold cross validation to evaluate performance in terms of mean average precision (average precision for each fold divided by the number of folds for cross validation – 10 in my case). I would like to plot PR-curves of the result of mean average precision over these 10 folds, however I'm not sure the best way to do this.

A previous question in the Cross Validated Stack Exchange site raised this same problem. A comment recommended working through this example on plotting ROC curves across folds of cross validation from the Scikit-Learn site, and tailoring it to average precision. Here is the relevant section of code I've modified to try this idea:

from scipy import interp
# Other packages/functions are imported, but not crucial to the question
max_ent = LogisticRegression()

mean_precision = 0.0
mean_recall = np.linspace(0,1,100)
mean_average_precision = []

for i in set(folds):
    y_scores = max_ent.fit(X_train, y_train).decision_function(X_test)
    precision, recall, _ = precision_recall_curve(y_test, y_scores)
    average_precision = average_precision_score(y_test, y_scores)
    mean_average_precision.append(average_precision)
    mean_precision += interp(mean_recall, recall, precision)

# After this line of code, inspecting the mean_precision array shows that 
# the majority of the elements equal 1. This is the part that is confusing me
# and is contributing to the incorrect plot.
mean_precision /= len(set(folds))
# This is what the actual MAP score should be
mean_average_precision = sum(mean_average_precision) / len(mean_average_precision)

# Code for plotting the mean average precision curve across folds
plt.plot(mean_recall, mean_precision)
plt.title('Mean AP Over 10 folds (area=%0.2f)' % (mean_average_precision))
plt.show()

The code runs, however in my case the mean average precision curve is incorrect. For some reason, the array I have assigned to store the mean_precision scores (mean_tpr variable in the ROC example) computes the first element to be near zero, and all other elements to be 1 after dividing by the number of folds. Below is a visualization of the mean_precision scores plotted against the mean_recall scores. As you can see, the plot jumps to 1 which is inaccurate.
enter image description here
So my hunch is something is going awry in the update of mean_precision (mean_precision += interp(mean_recall, recall, precision) ) at in each fold of cross-validation, but it's unclear how to fix this. Any guidance or help would be appreciated.

Best Answer

I had the same problem. Here is my solution: instead of averaging across the folds, I compute the precision_recall_curve across the results from all folds, after the loop. According to the discussion in https://stats.stackexchange.com/questions/34611/meanscores-vs-scoreconcatenation-in-cross-validation this is a generally preferable approach.

import matplotlib.pyplot as plt
import numpy
from sklearn.datasets import make_blobs
from sklearn.metrics import precision_recall_curve, auc
from sklearn.model_selection import KFold
from sklearn.svm import SVC

FOLDS = 5

X, y = make_blobs(n_samples=1000, n_features=2, centers=2, cluster_std=10.0,
    random_state=12345)

f, axes = plt.subplots(1, 2, figsize=(10, 5))

axes[0].scatter(X[y==0,0], X[y==0,1], color='blue', s=2, label='y=0')
axes[0].scatter(X[y!=0,0], X[y!=0,1], color='red', s=2, label='y=1')
axes[0].set_xlabel('X[:,0]')
axes[0].set_ylabel('X[:,1]')
axes[0].legend(loc='lower left', fontsize='small')

k_fold = KFold(n_splits=FOLDS, shuffle=True, random_state=12345)
predictor = SVC(kernel='linear', C=1.0, probability=True, random_state=12345)

y_real = []
y_proba = []
for i, (train_index, test_index) in enumerate(k_fold.split(X)):
    Xtrain, Xtest = X[train_index], X[test_index]
    ytrain, ytest = y[train_index], y[test_index]
    predictor.fit(Xtrain, ytrain)
    pred_proba = predictor.predict_proba(Xtest)
    precision, recall, _ = precision_recall_curve(ytest, pred_proba[:,1])
    lab = 'Fold %d AUC=%.4f' % (i+1, auc(recall, precision))
    axes[1].step(recall, precision, label=lab)
    y_real.append(ytest)
    y_proba.append(pred_proba[:,1])

y_real = numpy.concatenate(y_real)
y_proba = numpy.concatenate(y_proba)
precision, recall, _ = precision_recall_curve(y_real, y_proba)
lab = 'Overall AUC=%.4f' % (auc(recall, precision))
axes[1].step(recall, precision, label=lab, lw=2, color='black')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].legend(loc='lower left', fontsize='small')

f.tight_layout()
f.savefig('result.png')

Related Solutions

Python – How to iterate over rows in a DataFrame in Pandas

DataFrame.iterrows is a generator which yields both the index and row (as a Series):

import pandas as pd

df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

10 100
11 110
12 120

Python – Plotting Precision-Recall curve when using cross-validation in scikit-learn

Instead of recording the precision and recall values after each fold, store the predictions on the test samples after each fold. Next, collect all the test (i.e. out-of-bag) predictions and compute precision and recall.

 ## let test_samples[k] = test samples for the kth fold (list of list)
 ## let train_samples[k] = test samples for the kth fold (list of list)

 for k in range(0, k):
      model = train(parameters, train_samples[k])
      predictions_fold[k] = predict(model, test_samples[k])

 # collect predictions
 predictions_combined = [p for preds in predictions_fold for p in preds]

 ## let predictions = rearranged predictions s.t. they are in the original order

 ## use predictions and labels to compute lists of TP, FP, FN
 ## use TP, FP, FN to compute precisions and recalls for one run of k-fold cross-validation

Under a single, complete run of k-fold cross-validation, the predictor makes one and only one prediction for each sample. Given n samples, you should have n test predictions.

(Note: These predictions are different from training predictions, because the predictor makes the prediction for each sample without having been previously seen it.)

Unless you are using leave-one-out cross-validation, k-fold cross validation generally requires a random partitioning of the data. Ideally, you would do repeated (and stratified) k-fold cross validation. Combining precision-recall curves from different rounds, however, is not straight forward, since you cannot use simple linear interpolation between precision-recall points, unlike ROC (See Davis and Goadrich 2006).

I personally calculated AUC-PR using the Davis-Goadrich method for interpolation in PR space (followed by numerical integration) and compared the classifiers using the AUC-PR estimates from repeated stratified 10-fold cross validation.

For a nice plot, I showed a representative PR curve from one of the cross-validation rounds.

There are, of course, many other ways of assessing classifier performance, depending on the nature of your dataset.

For instance, if the proportion of (binary) labels in your dataset is not skewed (i.e. it is roughly 50-50), you could use the simpler ROC analysis with cross-validation:

Collect predictions from each fold and construct ROC curves (as before), collect all the TPR-FPR points (i.e. take the union of all TPR-FPR tuples), then plot the combined set of points with possible smoothing. Optionally, compute AUC-ROC using simple linear interpolation and the composite trapezoid method for numerical integration.

Best Answer

Related Solutions

Python – How to iterate over rows in a DataFrame in Pandas

Python – Plotting Precision-Recall curve when using cross-validation in scikit-learn

Related Topic