Python – Join two DataFrames on one key column / ERROR: ‘columns overlap but no suffix specified’

join;pandaspythonsqlsyntax-error

I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column.
The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)

enter image description here

enter image description here

I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examples from the pandas website
my code

sales.join(part_table, on='PART NUMBER')

But I got this error:

ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')

Can someone explain what this error means and how to solve it?

Many thanks!

Best Answer

I think you want to do a merge rather than a join:

sales.merge(part_table)

Here's an example dataframe:

In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])

In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')

In [14]: dfa.merge(dfb)
Out[14]:
   A  B  C
0  1  2  a
1  3  4  b
2  3  4  c

It's unclear from the docs if this is intentational (I thought that on would be used as the column) but following the exceptions message if you add suffixs we can see what's going on:

In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
   A_a  C  A_b   B
0    1  a    3   4
1    3  b  NaN NaN
2    3  c  NaN NaN

In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
   A_a  C  A_b   B
0    1  a    1   2
1    3  b    3   4
2    3  c  NaN NaN

It's ignoring the on kwarg and just doing the join.