R – How to remove columns from a data.frame

dataframer

Not so much 'How do you…?' but more 'How do YOU…?'

If you have a file someone gives you with 200 columns, and you want to reduce it to the few ones you need for analysis, how do you go about it? Does one solution offer benefits over another?

Assuming we have a data frame with columns col1, col2 through col200. If you only wanted 1-100 and then 125-135 and 150-200, you could:

dat$col101 <- NULL
dat$col102 <- NULL # etc

or

dat <- dat[,c("col1","col2",...)]

or

dat <- dat[,c(1:100,125:135,...)] # shortest probably but I don't like this

or

dat <- dat[,!names(dat) %in% c("dat101","dat102",...)]

Anything else I'm missing? I know this is sightly subjective but it's one of those nitty gritty things where you might dive in and start doing it one way and fall into a habit when there are far more efficient ways out there. Much like this question about which.

EDIT:

Or, is there an easy way to create a workable vector of column names? name(dat) doesn't print them with commas in between, which you need in the code examples above, so if you print out the names in that way you have spaces everywhere and have to manually put in commas… Is there a command that will give you "col1","col2","col3",… as your output so you can easily grab what you want?

Best Answer

I use data.table's := operator to delete columns instantly regardless of the size of the table.

DT[, coltodelete := NULL]

or

DT[, c("col1","col20") := NULL]

or

DT[, (125:135) := NULL]

or

DT[, (variableHoldingNamesOrNumbers) := NULL]

Any solution using <- or subset will copy the whole table. data.table's := operator merely modifies the internal vector of pointers to the columns, in place. That operation is therefore (almost) instant.