I'm trying to write a function to accept a data.frame (x
) and a column
from it. The function performs some calculations on x and later returns another data.frame. I'm stuck on the best-practices method to pass the column name to the function.
The two minimal examples fun1
and fun2
below produce the desired result, being able to perform operations on x$column
, using max()
as an example. However, both rely on the seemingly (at least to me) inelegant
- call to
substitute()
and possiblyeval()
- the need to pass the column name as a character vector.
fun1 <- function(x, column){
do.call("max", list(substitute(x[a], list(a = column))))
}
fun2 <- function(x, column){
max(eval((substitute(x[a], list(a = column)))))
}
df <- data.frame(B = rnorm(10))
fun1(df, "B")
fun2(df, "B")
I would like to be able to call the function as fun(df, B)
, for example. Other options I have considered but have not tried:
- Pass
column
as an integer of the column number. I think this would avoidsubstitute()
. Ideally, the function could accept either. with(x, get(column))
, but, even if it works, I think this would still requiresubstitute
- Make use of
formula()
andmatch.call()
, neither of which I have much experience with.
Subquestion: Is do.call()
preferred over eval()
?
Best Answer
You can just use the column name directly:
There's no need to use substitute, eval, etc.
You can even pass the desired function as a parameter:
Alternatively, using
[[
also works for selecting a single column at a time: