R – predict() returns nothing for type = “class” works fine with type = “raw”

predictr

Training data is read in from two files–one with the independent variables only (df.train) and one with the actual corresponding class values only (df.churn). These values are -1 and 1 only. I then remove all-NA columns and remove duplicate columns in there are any found.

I assemble the two sets of data into a single dataframe with independent and class values, and run naiveBayes() without and errors.

Using the model produced by naiveBayes, I run predict() and note that the output with type = "raw" looks like reasonable data–in most cases those probabilities are relatively close to 0 or 1. I show the first 6 elements below.

I'm looking for the actual predicted class values for input into prediction() with a view to getting an ROC plot and an AUC value. I run predict() again with type = "class", and this is where I get basically nothing at all.

    df.train <- read.csv('~/projects/kdd_analysis/data/train_table.csv', header=TRUE, sep=',')
    df.churn <- read.csv('~/projects/kdd_analysis/data/sm_churn_labels.csv', header=TRUE, sep=',')
    df.train <- df.train[,colSums(is.na(df.train))<nrow(df.train)]
    df.train <- df.train[!duplicated(lapply(df.train,c))]
    df.train_C <- cbind(df.train, df.churn)
    mod_C <- naiveBayes(V1~., df.train_C, laplace=0.01)
    pre_C <- predict(mod_C, df.train ,type="raw", threshold=0.001)

I'm running predict() against the training data intentionally because I thought that would be interesting. Below, the values out of predict() seem 'reasonable' to me…that is, they at least don't seem like complete nonsense. I have not compared them to the actuals yet, and would expect to use the explicit class values given by predict() to do that.

    head(pre_C)
           -1            1
    [1,] 9.996934e-01 3.066321e-04
    [2,] 9.005501e-07 9.999991e-01
    [3,] 1.000000e+00 3.468739e-11
    [4,] 9.362914e-01 6.370858e-02
    [5,] 9.854649e-01 1.453510e-02
    [6,] 9.997680e-01 2.320003e-04

So, this is predict() run again against the identical model–I don't understand how it's possible for it to return nothing:

    > pre_C <- predict(mod_C, df.train ,type="class", threshold=0.001)
    > pre_C
    factor(0)
    Levels:

Best Answer

The solution is to coerce the column of class variables to type factor:

df.train_C$V1 <- factor(df.train_C$V1)

then run the model and predict() as before. I changed nothing else and this one mod 'fixed' the issue. Courtesy Andy Liaw at r-help.

Related Solutions

R – using predict with a list of lm() objects

Here's my attempt:

predNaughty <- ddply(newData, "state", transform,
  value=predict(modelList[[paste(piece$state[1])]], newdata=piece))
head(predNaughty)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229
predDiggsApproved <- ddply(newData, "state", function(x)
  transform(x, value=predict(modelList[[paste(x$state[1])]], newdata=x)))
head(predDiggsApproved)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229

JD Long edit

I was inspired enough to work out an adply() option:

pred3 <- adply(newData, 1,  function(x)
    predict(modelList[[paste(x$state)]], newdata=x))
head(pred3)
#   year state        1
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229

R – Predict() – Maybe I’m not understanding it

First, you want to use

model <- lm(Total ~ Coupon, data=df)

not model <-lm(df$Total ~ df$Coupon, data=df).

Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.

Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.

Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:

model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)

Best Answer

Related Solutions

R – using predict with a list of lm() objects

R – Predict() – Maybe I’m not understanding it

Related Topic