Python – How to avoid float values in regression models

linear-regressionnumpypythonregressionscikit-learn

I am trying to predict wine quality (ranges from 1 to 10) using regression models such as linear,SGDRegressor, ridge,lasso.

dataset:http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

Independent values:volatile acidity,residual sugar,free sulfur dioxide,total sulfur dioxide,alchohol
Dependent:Quality

Linear model

regr = linear_model.LinearRegression(n_jobs=3)
regr.fit(x_train, y_train)
predicted = regr.predict(x_test)

predicted values for LinearRegression
array([ 5.33560542, 5.47347404, 6.09337194, …, 5.67566813,
5.43609198, 6.08189 ])

predicted values are in float instead of (1,2,3…10)
I tried to round predicted values using numpy

predicted = np.round(regr.predict(x_test))` but my accuracy gone down with this attempt.

SGDRegressor model.

from sklearn import linear_model
np.random.seed(0)
clf = linear_model.SGDRegressor()
clf.fit(x_train, y_train)
redicted = np.floor(clf.predict(x_test))

predicted output values for SGDRegressor:

array([ -2.77685458e+12,   3.26826414e+12,   4.18655713e+11, ...,
     4.72375220e+12,  -7.08866307e+11,   3.95571514e+12])

Here I am unable to convert the output values into integers.

Could someone please let me know the best way to predict the wine quality using these regression models.

Best Answer

You are doing a regression and therefore the output is continuous in nature.

The thing you should note is that your mini-project on predicting wine quality is not a classification problem. The response variable y, the wine quality, has intrinsic order which means a score of 6 is strictly better than a score of 5. It is NOT categorical variable where different numbers just represent different groups where groups are non-comparable.