Python – sklearn LogisticRegression and changing the default threshold for classification

classificationpythonregressionscikit-learn

I am using LogisticRegression from the sklearn package, and have a quick question about classification. I built a ROC curve for my classifier, and it turns out that the optimal threshold for my training data is around 0.25. I'm assuming that the default threshold when creating predictions is 0.5. How can I change this default setting to find out what the accuracy is in my model when doing a 10-fold cross-validation? Basically, I want my model to predict a '1' for anyone greater than 0.25, not 0.5. I've been looking through all the documentation, and I can't seem to get anywhere.

Best Answer

I would like to give a practical answer

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, roc_auc_score, precision_score

X, y = make_classification(
    n_classes=2, class_sep=1.5, weights=[0.9, 0.1],
    n_features=20, n_samples=1000, random_state=10
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = LogisticRegression(class_weight="balanced")
clf.fit(X_train, y_train)
THRESHOLD = 0.25
preds = np.where(clf.predict_proba(X_test)[:,1] > THRESHOLD, 1, 0)

pd.DataFrame(data=[accuracy_score(y_test, preds), recall_score(y_test, preds),
                   precision_score(y_test, preds), roc_auc_score(y_test, preds)], 
             index=["accuracy", "recall", "precision", "roc_auc_score"])

By changing the THRESHOLD to 0.25, one can find that recall and precision scores are decreasing. However, by removing the class_weight argument, the accuracy increases but the recall score falls down. Refer to the @accepted answer

Related Solutions

Python – What does ** (double star/asterisk) and * (star/asterisk) do for parameters

The *args and **kwargs is a common idiom to allow arbitrary number of arguments to functions as described in the section more on defining functions in the Python documentation.

The *args will give you all function parameters as a tuple:

def foo(*args):
    for a in args:
        print(a)        

foo(1)
# 1

foo(1,2,3)
# 1
# 2
# 3

The **kwargs will give you all keyword arguments except for those corresponding to a formal parameter as a dictionary.

def bar(**kwargs):
    for a in kwargs:
        print(a, kwargs[a])  

bar(name='one', age=27)
# name one
# age 27

Both idioms can be mixed with normal arguments to allow a set of fixed and some variable arguments:

def foo(kind, *args, **kwargs):
   pass

It is also possible to use this the other way around:

def foo(a, b, c):
    print(a, b, c)

obj = {'b':10, 'c':'lee'}

foo(100,**obj)
# 100 10 lee

Another usage of the *l idiom is to unpack argument lists when calling a function.

def foo(bar, lee):
    print(bar, lee)

l = [1,2]

foo(*l)
# 1 2

In Python 3 it is possible to use *l on the left side of an assignment (Extended Iterable Unpacking), though it gives a list instead of a tuple in this context:

first, *rest = [1,2,3,4]
first, *l, last = [1,2,3,4]

Also Python 3 adds new semantic (refer PEP 3102):

def func(arg1, arg2, arg3, *, kwarg1, kwarg2):
    pass

Such function accepts only 3 positional arguments, and everything after * can only be passed as keyword arguments.

Note:

A Python dict, semantically used for keyword argument passing, are arbitrarily ordered. However, in Python 3.6, keyword arguments are guaranteed to remember insertion order.
"The order of elements in **kwargs now corresponds to the order in which keyword arguments were passed to the function." - What’s New In Python 3.6
In fact, all dicts in CPython 3.6 will remember insertion order as an implementation detail, this becomes standard in Python 3.7.

Python – the difference between Python’s list methods append and extend

append: Appends object at the end.

x = [1, 2, 3]
x.append([4, 5])
print(x)

gives you: [1, 2, 3, [4, 5]]

extend: Extends list by appending elements from the iterable.

x = [1, 2, 3]
x.extend([4, 5])
print(x)

gives you: [1, 2, 3, 4, 5]

Best Answer

Related Solutions

Python – What does ** (double star/asterisk) and * (star/asterisk) do for parameters

Note:

Python – the difference between Python’s list methods append and extend

Related Topic