R – Merge (join) data frames – too many rows in result

merger

I have two data frames(df1 and df2). I want to join them using merge function.

df1 has 3903 rows and df2 has 351 rows.

I want to left join df2 to df1 by a common column(column1). I am using merge function.

My code is like below:

dfjoin<-merge(df1,df2, by="column1",all.x=TRUE)

So I expect dfjoin has 3903 rows equal to rows of df1. However it returns 4010 rows.

Why does it return more rows than expected. I will be very glad for any help. Thanks a lot.

Best Answer

This may be because the values in column1 from df2 are not a 1-1 mapping. Meaning a single value in column1 may be related to more than one value in column2. You can check this by using table(df2$column1). If you find a value from column1 with a count > 1 then this is the reason.

Also I would like to recommend an alternative if you are more comfortable with sql there is a very nice library called sqldf which allows you to use sql like queries on your data frames!

Related Topic