R – ddply error: Error in attributes(out) <- attributes(col) : 'names' attribute must be the same length as the vector

plyrr

I am trying to apply ddply on a large data.frame (38000 rows / 10 variables), but I am stuck with an error:

ddply(uncertainty.long, .(Species), "nrow")

returns the error:

Error in attributes(out) <- attributes(col) : 
  'names' attribute [38000] must be the same length as the vector [3800]
> traceback()
11: FUN(1:10[[5L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: (function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   })(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(uncertainty.long, .(Species), "nrow")

Some more details about my data.frame:

    > head(uncertainty.long)
                Stack Variable PARun Model             Species    value year scenario   GCM                    sp
1        sync_current    Total   PA1   GLM Arctosafulvolineata 100.0000   NA     <NA>  <NA> Arctosa\nfulvolineata
2 sync_cgcm2_B2A_2020    Total   PA1   GLM Arctosafulvolineata 134.6840 2020      B2A cgcm2 Arctosa\nfulvolineata
3 sync_cgcm2_B2A_2050    Total   PA1   GLM Arctosafulvolineata 153.7617 2050      B2A cgcm2 Arctosa\nfulvolineata
4 sync_cgcm2_B2A_2080    Total   PA1   GLM Arctosafulvolineata 195.7176 2080      B2A cgcm2 Arctosa\nfulvolineata
5   sync_mk2_B2A_2020    Total   PA1   GLM Arctosafulvolineata 172.2967 2020      B2A   mk2 Arctosa\nfulvolineata
6   sync_mk2_B2A_2050    Total   PA1   GLM Arctosafulvolineata 198.9391 2050      B2A   mk2 Arctosa\nfulvolineata
> str(uncertainty.long)
'data.frame':   38000 obs. of  10 variables:
 $ Stack   : Factor w/ 19 levels "sync_cgcm2_B2A_2020",..: 7 1 2 3 14 15 16 11 12 13 ...
 $ Variable: Factor w/ 5 levels "Lost","NetChange",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ PARun   : Factor w/ 5 levels "PA1","PA2","PA3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Model   : Factor w/ 8 levels "CTA","FDA","GAM",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...
 $ value   : num  100 135 154 196 172 ...
 $ year    : num  NA 2020 2050 2080 2020 2050 2080 2020 2050 2080 ...
 $ scenario: chr  NA "B2A" "B2A" "B2A" ...
 $ GCM     : chr  NA "cgcm2" "cgcm2" "cgcm2" ...
 $ sp      : chr  "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" ...

This is my sessionInfo():

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
 [1] parallel  splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.2.2      Hmisc_3.12-2        Formula_1.1-1       RCurl_1.95-4.1      bitops_1.0-6        biomod2_3.0.3       pROC_1.5.4          plyr_1.8           
 [9] rpart_4.1-3         randomForest_4.6-7  mda_0.4-4           class_7.3-9         gbm_2.1             survival_2.37-4     nnet_7.3-7          rasterVis_0.21     
[17] hexbin_1.26.2       latticeExtra_0.6-26 RColorBrewer_1.0-5  lattice_0.20-23     abind_1.4-0         raster_2.1-49       sp_1.0-13           ggplot2_0.9.3.1    

loaded via a namespace (and not attached):
 [1] cluster_1.14.4   colorspace_1.2-2 dichromat_2.0-0  digest_0.6.3     gtable_0.1.2     labeling_0.2     MASS_7.3-29      munsell_0.4.2    proto_0.3-10     scales_0.2.3    
[11] stringr_0.6.2    tools_3.0.1      zoo_1.7-10      

I have tried to reproduce it with a fewer number of columns (2 columns), it did not change anything.
However, if I reduce the number of lines, it can work when the requested variable "Species" has only one level value:

> small.df <- uncertainty.long[1:3800, ]
> unique(small.df$Species)
[1] Arctosafulvolineata
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis 
> ddply(small.df, .(Species), "nrow")
                  Species nrow
    1 Arctosafulvolineata 3800

But if I had another line:

> small.df <- uncertainty.long[1:3801, ]
> unique(small.df$Species)
[1] Arctosafulvolineata Argyronetaaquatica 
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> small.df[3800:3801, ]
                    Stack Variable PARun  Model             Species     value year scenario    GCM                    sp
3800 sync_hadcm3_A1B_2080     Lost   PA5 MAXENT Arctosafulvolineata -54.90872 2080      A1B hadcm3 Arctosa\nfulvolineata
3801         sync_current    Total   PA1    GLM  Argyronetaaquatica 100.00000   NA     <NA>   <NA>  Argyroneta\naquatica
> ddply(small.df, .(Species), "nrow")
Error in attributes(out) <- attributes(col) : 
  'names' attribute [3801] must be the same length as the vector [3800]

I have found others with a similar problem : https://stackoverflow.com/a/14162351/2788395.

However, their workaround (reinstalling plyr 1.7 instead of 1.8) did not work for me.
Does anyone have an idea of the problem and/or how to solve it?

Thanks!

Problem solved
The issue was with the "names" attribute of the "Species" column.
I removed them with the following code and ddply worked:

> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
               Species nrow
1  Arctosafulvolineata 3800
2   Argyronetaaquatica 3800
3  Dolomedesplantarius 3800
4   Enoplognathamordax 3800
5      Iciussubinermis 3800
6       Neonvalentulus 3800
7    Pardosabifasciata 3800
8     Pardosaoreophila 3800
9     Piratauliginosus 3800
10 Trochosaspinipalpis 3800

Best Answer

The issue was with the "names" attribute of the "Species" column:

$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...

I removed them with the following code and ddply worked:

> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
               Species nrow
1  Arctosafulvolineata 3800
2   Argyronetaaquatica 3800
3  Dolomedesplantarius 3800
4   Enoplognathamordax 3800
5      Iciussubinermis 3800
6       Neonvalentulus 3800
7    Pardosabifasciata 3800
8     Pardosaoreophila 3800
9     Piratauliginosus 3800
10 Trochosaspinipalpis 3800
Related Topic