Python – Closest equivalent of a factor variable in Python Pandas

pandaspythonr

What is the closest equivalent to an R Factor variable in Python pandas?

Best Answer

This question seems to be from a year back but since it is still open here's an update. pandas has introduced a categorical dtype and it operates very similar to factors in R. Please see this link for more information:

http://pandas-docs.github.io/pandas-docs-travis/categorical.html

Reproducing a snippet from the link above showing how to create a "factor" variable in pandas.

In [1]: s = Series(["a","b","c","a"], dtype="category")

In [2]: s
Out[2]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a < b < c]