R – Better way to filter a data frame with dplyr using OR

dataframedplyrr

I have a data frame in R with columns subject1 and subject2 (which contain Library of Congress subject headings). I'd like to filter the data frame by testing whether the subjects match an approved list. Say, for example, that I have this data frame.

data <- data.frame(
  subject1 = c("History", "Biology", "Physics", "Digital Humanities"),
  subject2 = c("Chemistry", "Religion", "Chemistry", "Religion")
)

And suppose this is the list of approved subjects.

condition <- c("History", "Religion")

What I want to do is filter by either subject1 or subject2:

subset <- filter(data, subject1 %in% condition | subject2 %in% condition)

That returns items 1, 2, and 4 from the original data frame, as desired.

Is that the best way to filter by multiple fields using or rather than and logic? It seems like there must be a better, more idiomatic way, but I don't know what it is.

Maybe a more generic way to ask the question is to say, if I combine subject1 and subject2, is there a way of testing if any value in one vector matches any value in another vector. I'd like to write something like:

subset <- filter(data, c(subject1, subject2) %in% condition)

Best Answer

I'm not sure whether this approach is better. At least you don't have to write the column names:

library(dplyr)
filter(data, rowSums(sapply(data, "%in%", condition)))
#             subject1  subject2
# 1            History Chemistry
# 2            Biology  Religion
# 3 Digital Humanities  Religion