R – Using data.table package inside the own package

data.tabler

I am trying to use the data.table package inside my own package. MWE is as follows:

I create a function, test.fun, that simply creates a small data.table object, and then sums the "Val" column grouping by the "A" column. The code is

test.fun<-function ()
{
    library(data.table)
    testdata<-data.table(A=rep(seq(1,5), 5), Val=rnorm(25))
    setkey(testdata, A)
    res<-testdata[,{list(Ct=length(Val),Total=sum(Val),Avg=mean(Val))},"A"]
    return(res)
}

When I create this function in a regular R session, and then run the function, it works as expected.

> res<-test.fun()
data.table 1.8.0  For help type: help("data.table")
> res
     A Ct      Total        Avg
[1,] 1  5 -0.5326444 -0.1065289
[2,] 2  5 -4.0832062 -0.8166412
[3,] 3  5  0.9458251  0.1891650
[4,] 4  5  2.0474791  0.4094958
[5,] 5  5  2.3609443  0.4721889

When I put this function into a package, install the package, load the package, and then run the function, I get an error message.

> library(testpackage)
> res<-test.fun()
data.table 1.8.0  For help type: help("data.table")
Error in `[.data.frame`(x, i, j) : object 'Val' not found

Can anybody explain to me why this is happening and what I can do to fix it. Any help is very much appreciated.

Best Answer

Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:

FAQ 6.9: I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table-aware packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame is() a data.table, too.

This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :

CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.

Related Topic