R – Error with custom aggregate function for a cast() call in R reshape2

aggregatecastingrreshapereshape2

I want to use R to summarize numerical data in a table with non-unique rownames to a result table with unique row-names with values summarized using a custom function. The summarization logic is: use the mean of values if the ratio of the maximum to the minimum value is < 1.5, else use median. Because the table is very large, I am trying to use the melt() and cast() functions in the reshape2 package.

# example table with non-unique row-names
tab <- data.frame(gene=rep(letters[1:3], each=3), s1=runif(9), s2=runif(9))
# melt
tab.melt <- melt(tab, id=1)
# function to summarize with logic: mean if max/min < 1.5, else median
summarize <- function(x){ifelse(max(x)/min(x)<1.5, mean(x), median(x))}
# cast with summarized values
dcast(tab.melt, gene~variable, summarize)

The last line of code above results in an error notice.

Error in vapply(indices, fun, .default) : 
  values must be type 'logical',
 but FUN(X[[1]]) result is type 'double'
In addition: Warning messages:
1: In max(x) : no non-missing arguments to max; returning -Inf
2: In min(x) : no non-missing arguments to min; returning Inf

What am I doing wrong? Note that if the summarize function were to just return min(), or max(), there is no error, though there is the warning message about 'no non-missing arguments.' Thank you for any suggestion.

(The actual table I want to work with is a 200×10000 one.)

Best Answer

Short answer: provide a value for fill as follows acast(tab.melt, gene~variable, summarize, fill=0)

Long answer: It appears your function gets wrapped as follows, before being passed to vapply in the vaggregate function (dcast calls cast which calls vaggregate which calls vapply):

fun <- function(i) {
    if (length(i) == 0) 
        return(.default)
    .fun(.value[i], ...)
}

To find out what .default should be, this code is executed

if (is.null(.default)) {
    .default <- .fun(.value[0])
}

i.e. .value[0] is passed to the function. min(x) or max(x) returns Inf or -Inf on when x is numeric(0). However, max(x)/min(x) returns NaN which has class logical. So when vapply is executed

vapply(indices, fun, .default)

with the default value being is of class logical (used as template by vapply), the function fails when starting to return doubles.

Related Topic