I'm using dplyrs left join function in order to match two dataframes.
I have a panel data set A which consists of 4708 rows and 2 columns ID and Name:
ID Name
1 Option1
1 Option2
1 Option3
2 Option2
2 Option3
3 Option1
3 Option4
My dataset B consists of single definitions and categories for each name column (86 rows):
Name Definition Category
Option1 Def1 1
Option2 Def2 1
Option3 Def2 2
Option4 Def3 2
So in the end I need following data set C which links the columns of B to A:
ID Name Definition Category
1 Option1 Def1 1
1 Option2 Def2 1
1 Option3 Def2 2
2 Option2 Def2 1
2 Option3 Def2 2
3 Option1 Def1 1
3 Option4 Def3 2
I used a left_join command in dplyr to do this:
Data C <- left_join(A,B, by="name")
However, for some reason I got 5355 rows instead of the original 4708, so rows were some added. My understanding was that left_join simply assigns the definitions & categories of B to data set A.
Why do I get more rows ? Or are there any other ways to get the desired data frame C?
Best Answer
With
left_join(A, B)
new rows will be added wherever there are multiple rows inB
for which the key columns (same-name columns by default) match the same, single row inA
. For example: