R – data.table: Using with=False and transforming function/summary function

data.tablegroup-summarieslapplyrsummarization

I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I wanted to pass these in, using with=FALSE.

carsx=as.data.table(cars)
lapply( list(speed="speed",dist= "dist"), #error object 'ansvals' not found
    function(x)  carsx[,list(mean(x), min(x), max(x) ), with=FALSE ] ) 

Since this does not work, I tried the more simple approach without lapply.

carsx[,list(mean("speed"), min("speed"), max("speed") ), with=FALSE ] #error object 'ansvals' not found

This does not work either. Is there any way to do something like this? Is this behaviour of 'with' wanted? (I am aware that ?data.table mentions with only to select columns, but in my case it would be useful to be able to transform them as well)

When with=FALSE, j is a vector of names or positions to select, similar to a data.frame. with=FALSE is often useful in data.table to select columns dynamically.

EDIT
My aim is to get a summary per group in wide format, for different variables.
I tried to extend the following, which works only for one variable, for a list of variables.

carsx[,list(mean(speed), min(speed), max(speed) ) ,by=(dist>50)

Lamentably SO doesnt let me post my other question. There I described that I want an output similiar to:

lapply( list(speed="speed",dist= "dist"),
        function(x) do.call("as.data.frame", aggregate(cars[,x], list(class=cars$dist>50), FUN=summary) ) )

Expected Output would be something like:

$speed 
         V1       V2 V3
1: FALSE 12.96970  4 20
2:  TRUE 20.11765 14 25

$dist
         V1       V2 V3
1: FALSE 12.96970  4 20
2:  TRUE 20.11765 14 25

Best Answer

You can specify the columns with the .SDcols parameter:

carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), 
      .SDcols = c("speed", "dist")]
#    speed   dist
# 1:  15.4  42.98
# 2:   4.0   2.00
# 3:  25.0 120.00

carsx[ , lapply(.SD, function(x) c(mean(x), min(x), max(x))), 
      .SDcols = "speed"]
#    speed
# 1:  15.4
# 2:   4.0
# 3:  25.0
Related Topic