Python – How to use the split function on every row in a dataframe in Python

dataframepythonstring

I want to count the number of times a word is being repeated in the review string

I am reading the csv file and storing it in a python dataframe using the below line

reviews = pd.read_csv("amazon_baby.csv")

The code in the below lines work when I apply it to a single review.

print reviews["review"][1]
a = reviews["review"][1].split("disappointed")
print a
b = len(a)
print b

The output for the above lines were

it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.']
2

When I apply the same logic to the entire dataframe using the below line. I receive an error message

reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1

Error message:

Traceback (most recent call last):
  File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module>
    reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
  File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'split'

Best Answer

You're trying to split the entire review column of the data frame (which is the Series mentioned in the error message). What you want to do is apply a function to each row of the data frame, which you can do by calling apply on the data frame:

f = lambda x: len(x["review"].split("disappointed")) -1
reviews["disappointed"] = reviews.apply(f, axis=1)