Python – Closest equivalent of a factor variable in Python Pandas


What is the closest equivalent to an R Factor variable in Python pandas?

Best Answer

This question seems to be from a year back but since it is still open here's an update. pandas has introduced a categorical dtype and it operates very similar to factors in R. Please see this link for more information:

Reproducing a snippet from the link above showing how to create a "factor" variable in pandas.

In [1]: s = Series(["a","b","c","a"], dtype="category")

In [2]: s
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a < b < c]