I am using Support Vector Regression as an estimator in GridSearchCV. But I want to change the error function: instead of using the default (R-squared: coefficient of determination), I would like to define my own custom error function.
I tried to make one with make_scorer
, but it didn't work.
I read the documentation and found that it's possible to create custom estimators, but I don't need to remake the entire estimator – only the error/scoring function.
I think I can do it by defining a callable as a scorer, like it says in the docs.
But I don't know how to use an estimator: in my case SVR. Would I have to switch to a classifier (such as SVC)? And how would I use it?
My custom error function is as follows:
def my_custom_loss_func(X_train_scaled, Y_train_scaled):
error, M = 0, 0
for i in range(0, len(Y_train_scaled)):
z = (Y_train_scaled[i] - M)
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
if X_train_scaled[i] > M and Y_train_scaled[i] < M:
error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
error += error_i
return error
The variable M
isn't null/zero. I've just set it to zero for simplicity.
Would anyone be able to show an example application of this custom scoring function? Thanks for your help!
Best Answer
Jamie has a fleshed out example, but here's an example using make_scorer straight from scikit-learn documentation:
When defining a custom scorer via
sklearn.metrics.make_scorer
, the convention is that custom functions ending in_score
return a value to maximize. And for scorers ending in_loss
or_error
, a value is returned to be minimized. You can use this functionality by setting thegreater_is_better
parameter insidemake_scorer
. That is, this parameter would beTrue
for scorers where higher values are better, andFalse
for scorers where lower values are better.GridSearchCV
can then optimize in the appropriate direction.You can then convert your function as a scorer as follows:
And then pass
custom_scorer
intoGridSearchCV
as you would any other scoring function:clf = GridSearchCV(scoring=custom_scorer)
.