Python – ipython pandas TypeError: read_csv() got an unexpected keyword argument ‘delim-whitespace”

ipythonpandaspythonpython-2.7

While trying the ipython.org notebook, "INTRODUCTION TO PYTHON FOR DATA MINING"

The following code:

data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original",
               delim_whitespace = True, header=None,
               names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
                        'model', 'origin', 'car_name'])

yields the following error:

 TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'

Unfortunately the dataset file itself is not really csv, and I don't know why they used read_csv() to get its data.

The data looks like this line:

 14.0   8.   454.0      220.0      4354.       9.0   70.  1.    "chevrolet impala"

The environment is python/2.7 on Debian stable w/ ipython 0.13.
After searching here, I realize it's mostly likely a version problem,
as the argument 'delim-whitespace' maybe in a later version of the pandas library, than the one available to the APT package manager.

I tried several workarounds, without success.

First, I tried to upgrade pandas, by building from latest source, but i found i would end up with a cascade of other builds of dependencies whose versions need upgrading and could end up breaking the environment. E.g., I had to install Cython, then it reported it was again
a version too old on the APT package manager, so I would have to rebuild Cython, + other libs/modules and so on.
Then after looking at the API a bit, I tried using other arguments:
using delimiter = ' ' in the call to read_csv() caused
it to break up the strings inside quotes into several columns,
```
ValueError: Expecting 9 columns, got 13 in row 0
```
I tried using the read_csv() argument quotechar='"' , as documented in the API but again it was not recognized (unexpected keyword argument)

Finally I tried using a different way to load the file,

data = DataFrame()

data.from_csv(url)

I got,

Out[18]: 
<class 'pandas.core.frame.DataFrame'>
Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
Empty DataFrame

In [19]: print(data.shape)
(0, 9)

alternatively, w/ sep argument to from_csv(),

In [20]: data.from_csv(url,sep=' ')

yields the error,

ValueError: Expecting 31 columns, got 35 in row 1
In [21]: print(data.shape)
(0, 9)

Also alternatively, with the same negative result:

In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name'])

In [33]: data.from_csv(url,sep=', \t')Out[33]: 
<class 'pandas.core.frame.DataFrame'>
Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
Empty DataFrame

In [34]: data.head()
Out[34]: 
Empty DataFrame

I tried using ipython3 instead,
but it cannot find/load matplotlib as there is not matplotlib for python3 for my
system.

Any help with this problem would be greatly appreciated.

Best Answer

Oddly, the delim_whitespace parameter appears in the Pandas documentation in the method summary but not the parameters list. Try replacing it with delimiter = r'\s+', which is equivalent to what I assume the authors meant.

CSV does refer to comma-separated values, but it's often used to refer to general delimited-text formats. TSV (tab-separated values) is another variant; in this case it's basically whitespace-separated values.

Related Solutions

Python – class method generates “TypeError: … got multiple values for keyword argument …”

The problem is that the first argument passed to class methods in python is always a copy of the class instance on which the method is called, typically labelled self. If the class is declared thus:

class foo(object):
  def foodo(self, thing=None, thong='not underwear'):
    print thing if thing else "nothing" 
    print 'a thong is',thong

it behaves as expected.

Explanation:

Without self as the first parameter, when myfoo.foodo(thing="something") is executed, the foodo method is called with arguments (myfoo, thing="something"). The instance myfoo is then assigned to thing (since thing is the first declared parameter), but python also attempts to assign "something" to thing, hence the Exception.

To demonstrate, try running this with the original code:

myfoo.foodo("something")
print
print myfoo

You'll output like:

<__main__.foo object at 0x321c290>
a thong is something

<__main__.foo object at 0x321c290>

You can see that 'thing' has been assigned a reference to the instance 'myfoo' of the class 'foo'. This section of the docs explains how function arguments work a bit more.

C++ – Using GCC’s C++0x mode in production

IMHO, TR1 support and auto are safe to use. In the case of auto it was one of the first features to be included into the standard and is a relatively small change to the language. I would therefore have no problem using it.

I would be a bit more hesitant about using initializer lists. On some other forums (eg. comp.lang.c++.moderated) there are questions about their behaviour and its possible that they may change closer to the release of the standard.

Best Answer

Related Solutions

Python – class method generates “TypeError: … got multiple values for keyword argument …”

C++ – Using GCC’s C++0x mode in production

Related Topic