The data.table package in R provides the option:
which: ‘TRUE’ returns the integer row numbers of ‘x’ that ‘i’
matches to.
However, I see no way of obtaining, within j
, the integer row numbers of 'x' within the groups established using by
.
For example, given…
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6))
…I would like to know the indices into DT for each value of y.
The value to me is that I am using a data.table in parallel with Another Data Structure (ADS) to which I intend to perform groupwise computations based on the efficiently computed groupings of the data.table.
For example, assuming ADS is a vector with a value for each row in DT:
ADS<-sample(100,nrow(DT))
I can, as a workaround, compute the groupwise mean of ADS determined by DT$y the group if I first add a new sequence column to the data.table.
DT[,seqNum:=seq_len(nrow(DT))]
DT[,mean(ADS[seqNum]),by=y]
Which gives the result I want at the cost of adding a new column.
I realize that in this example I can get the same answer using tapply:
tapply(ADS,DT$y,mean)
However, I will not then get the performance benefit of data.tables efficient grouping (especially when the 'by' columns are indexed).
Perhaps there is some syntax I am overlooking???
Perhaps this is an easy feature to add to data.table and I should request it (wink, wink)???
Proposed syntax: optionally set '.which' to the group indices, allowing to write:
DT[,mean(ADS[.which]),by=y,which=TRUE]
Best Answer
Available since
data.table
1.8.3 you can use.I
in thej
of adata.table
to get the row indices by groups...