R – meaning of ddply error: ‘names’ attribute [9] must be the same length as the vector [1]

plyrr

I'm going through Machine Learning for Hackers, and I am stuck at this line:

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

Which generates the following error:

Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

This is a traceback():

> traceback()
11: FUN(1:5[[1L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   }(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))

The priority.train object is a data frame, and here is more info:

> mode(priority.train)
[1] "list"
> names(priority.train)
[1] "Date"       "From.EMail" "Subject"    "Message"    "Path"      
> sapply(priority.train, mode)
       Date  From.EMail     Subject     Message        Path 
     "list" "character" "character" "character" "character" 
> sapply(priority.train, class)
$Date
[1] "POSIXlt" "POSIXt" 

$From.EMail
[1] "character"

$Subject
[1] "character"

$Message
[1] "character"

$Path
[1] "character"

> length(priority.train)
[1] 5
> nrow(priority.train)
[1] 1250
> ncol(priority.train)
[1] 5
> str(priority.train)
'data.frame':   1250 obs. of  5 variables:
 $ Date      : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ...
 $ From.EMail: chr  "removed@removed.ca" "removed@removed.net" "removed@removed.ca" "removed@removed.net" ...
 $ Subject   : chr  "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ...
 $ Message   : chr  "    \n Hello,\n   \n         I just installed redhat 7.2 and I think I have everything \nworking properly.  Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file.  Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file.  Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n>  I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ...
 $ Path      : chr  "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ...
Warning messages:
1: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform
2: In encodeString(object, quote = "\"", na.encode = FALSE) :
  it is not known that wchar_t is Unicode on this platform

I would post a sample, but the content is a bit long and I don't think the content is relevant here.

The same error also happens here:

> ddply(priority.train, .(Subject))
Error in attributes(out) <- attributes(col) : 
  'names' attribute [9] must be the same length as the vector [1]

Does anyone have a clue on what's going on here? The error seems to be generated by a different object than priority.train, because its names attribute apparently has 9 elements.

I'd appreciate any help. Thanks!

Problem solved

I've found the problem thanks to @user1317221_G's tip of using the dput function. The problem is with the Date field, which is at this point a list that contains 9 fields (sec, min, hour, mday, mon, year, wday, yday, isdst). To solve the problem I've simply converted the dates into character vectors, used ddply then converted the dates back to Date:

> tmp <- priority.train$Date
> priority.train$Date <- as.character(priority.train$Date)
> from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject))
> priority.train$Date <- tmp
> rm(tmp)

Best Answer

I fixed this problem I was having by converting format from POSIXlt to POSIXct as Hadley suggests above - one line of code:

    mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt" 
    mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames / ddply
Related Topic