Python – ipython pandas TypeError: read_csv() got an unexpected keyword argument ‘delim-whitespace”

ipythonpandaspythonpython-2.7

While trying the ipython.org notebook, "INTRODUCTION TO PYTHON FOR DATA MINING"

The following code:

data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original",
               delim_whitespace = True, header=None,
               names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
                        'model', 'origin', 'car_name'])

yields the following error:

 TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'

Unfortunately the dataset file itself is not really csv, and I don't know why they used read_csv() to get its data.

The data looks like this line:

 14.0   8.   454.0      220.0      4354.       9.0   70.  1.    "chevrolet impala"

The environment is python/2.7 on Debian stable w/ ipython 0.13.
After searching here, I realize it's mostly likely a version problem,
as the argument 'delim-whitespace' maybe in a later version of the pandas library, than the one available to the APT package manager.

I tried several workarounds, without success.

  • First, I tried to upgrade pandas, by building from latest source, but i found i would end up with a cascade of other builds of dependencies whose versions need upgrading and could end up breaking the environment. E.g., I had to install Cython, then it reported it was again
    a version too old on the APT package manager, so I would have to rebuild Cython, + other libs/modules and so on.

  • Then after looking at the API a bit, I tried using other arguments:
    using delimiter = ' ' in the call to read_csv() caused
    it to break up the strings inside quotes into several columns,

    ValueError: Expecting 9 columns, got 13 in row 0
    
  • I tried using the read_csv() argument quotechar='"' , as documented in the API but again it was not recognized (unexpected keyword argument)

  • Finally I tried using a different way to load the file,

    data = DataFrame()
    
    data.from_csv(url)
    

    I got,

    Out[18]: 
    <class 'pandas.core.frame.DataFrame'>
    Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
    Empty DataFrame
    
    In [19]: print(data.shape)
    (0, 9)
    
  • alternatively, w/ sep argument to from_csv(),

    In [20]: data.from_csv(url,sep=' ')
    

    yields the error,

    ValueError: Expecting 31 columns, got 35 in row 1
    In [21]: print(data.shape)
    (0, 9)
    
  • Also alternatively, with the same negative result:

    In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name'])
    
    In [33]: data.from_csv(url,sep=', \t')Out[33]: 
    <class 'pandas.core.frame.DataFrame'>
    Index: 405 entries, 15.0   8.   350.0      165.0      3693.      11.5   70.  1."buick skylark 320" to 31.0   4.   119.0      82.00      2720.      19.4   82.  1.   "chevy s-10"
    Empty DataFrame
    
    In [34]: data.head()
    Out[34]: 
    Empty DataFrame
    
  • I tried using ipython3 instead,
    but it cannot find/load matplotlib as there is not matplotlib for python3 for my
    system.

Any help with this problem would be greatly appreciated.

Best Answer

Oddly, the delim_whitespace parameter appears in the Pandas documentation in the method summary but not the parameters list. Try replacing it with delimiter = r'\s+', which is equivalent to what I assume the authors meant.

CSV does refer to comma-separated values, but it's often used to refer to general delimited-text formats. TSV (tab-separated values) is another variant; in this case it's basically whitespace-separated values.