Remove duplicated rows using dplyr

dplyrr

I have a data.frame like this –

set.seed(123)
df = data.frame(x=sample(0:1,10,replace=T),y=sample(0:1,10,replace=T),z=1:10)
> df
   x y  z
1  0 1  1
2  1 0  2
3  0 1  3
4  1 1  4
5  1 0  5
6  0 1  6
7  1 0  7
8  1 0  8
9  1 0  9
10 0 1 10

I would like to remove duplicate rows based on first two columns. Expected output –

df[!duplicated(df[,1:2]),]
  x y z
1 0 1 1
2 1 0 2
4 1 1 4

I am specifically looking for a solution using dplyr package.

Best Answer

Here is a solution using dplyr >= 0.5.

library(dplyr)
set.seed(123)
df <- data.frame(
  x = sample(0:1, 10, replace = T),
  y = sample(0:1, 10, replace = T),
  z = 1:10
)

> df %>% distinct(x, y, .keep_all = TRUE)
    x y z
  1 0 1 1
  2 1 0 2
  3 1 1 4

Any ggplots side-by-side (or n plots on a grid)

The function grid.arrange() in the gridExtra package will combine multiple plots; this is how you put two side by side.

require(gridExtra)
plot1 <- qplot(1)
plot2 <- qplot(1)
grid.arrange(plot1, plot2, ncol=2)

This is useful when the two plots are not based on the same data, for example if you want to plot different variables without using reshape().

This will plot the output as a side effect. To print the side effect to a file, specify a device driver (such as pdf, png, etc), e.g.

pdf("foo.pdf")
grid.arrange(plot1, plot2)
dev.off()

or, use arrangeGrob() in combination with ggsave(),

ggsave("foo.pdf", arrangeGrob(plot1, plot2))

This is the equivalent of making two distinct plots using par(mfrow = c(1,2)). This not only saves time arranging data, it is necessary when you want two dissimilar plots.

Appendix: Using Facets

Facets are helpful for making similar plots for different groups. This is pointed out below in many answers below, but I want to highlight this approach with examples equivalent to the above plots.

mydata <- data.frame(myGroup = c('a', 'b'), myX = c(1,1))

qplot(data = mydata, 
    x = myX, 
    facets = ~myGroup)

ggplot(data = mydata) + 
    geom_bar(aes(myX)) + 
    facet_wrap(~myGroup)

Update

the plot_grid function in the cowplot is worth checking out as an alternative to grid.arrange. See the answer by @claus-wilke below and this vignette for an equivalent approach; but the function allows finer controls on plot location and size, based on this vignette.

R – How to trim leading and trailing white space

As of R 3.2.0 a new function was introduced for removing leading/trailing white spaces:

trimws()

See: Remove Leading/Trailing Whitespace

Best Answer

Related Solutions

R – Side-by-side plots with ggplot2

Any ggplots side-by-side (or n plots on a grid)

Appendix: Using Facets

Update

R – How to trim leading and trailing white space

Related Topic