R – Quickly split a large vector into chunks in R

performancervector

My question is extremely closely related to this one:

Split a vector into chunks in R

I'm trying to split a large vector into known chunk sizes and it's slow. A solution for vectors with even remainders is here:

A quick solution when a factor exists is here:

Split dataframe into equal parts based on length of the dataframe

I would like to handle the case of no (large) factor existing, as I would like fairly large chunks.

My example for a vector much smaller than the one in my real life application:

d <- 1:6510321
# Sloooow
chunks <- split(d, ceiling(seq_along(d)/2000))

Best Answer

Using llply from the plyr package I was able to reduce the time.

chunks <- function(d, n){      
    chunks <- split(d, ceiling(seq_along(d)/n))
    names(chunks) <- NULL
    return(chunks)
 }

require(plyr)
plyrChunks <- function(d, n){
     is <- seq(from = 1, to = length(d), by = ceiling(n))
     if(tail(is, 1) != length(d)) {
          is <- c(is, length(d)) 
     } 
     chunks <- llply(head(seq_along(is), -1), 
                     function(i){
                         start <-  is[i];
                         end <- is[i+1]-1;
                         d[start:end]})
    lc <- length(chunks)
    td <- tail(d, 1)
    chunks[[lc]] <- c(chunks[[lc]], td)
    return(chunks)
 }

 # testing
 d <- 1:6510321
 n <- 2000

 system.time(chks <- chunks(d,n))
 #    user  system elapsed 
 #   5.472   0.000   5.472 

 system.time(plyrChks <- plyrChunks(d, n))
 #    user  system elapsed 
 #   0.068   0.000   0.065 

 identical(chks, plyrChks)
 # TRUE

You can speed even more using the .parallel parameter from the llpyr function. Or you can add a progress bar using the .progress parameter.