(This is a self-answered post to help others shorten their answers to plotly questions by not having to explain how plotly best handles data of long and wide format)
I'd like to build a plotly figure based on a pandas dataframe in as few lines as possible. I know you can do that using plotly.express, but this fails for what I would call a standard pandas dataframe; an index describing row order, and column names describing the names of a value in a dataframe:
Sample dataframe:
a b c
0 100.000000 100.000000 100.000000
1 98.493705 99.421400 101.651437
2 96.067026 98.992487 102.917373
3 95.200286 98.313601 102.822664
4 96.691675 97.674699 102.378682
An attempt:
fig=px.line(x=df.index, y = df.columns)
This raises an error:
ValueError: All arguments should have the same length. The length of argument
y
is 3, whereas the length of previous arguments ['x'] is 100`
Best Answer
Here you've tried to use a pandas dataframe of a wide format as a source for
px.line
. Andplotly.express
is designed to be used with dataframes of a long format, often referred to as tidy data (and please take a look at that. No one explains it better that Wickham). Many, particularly those injured by years of battling with Excel, often find it easier to organize data in a wide format. So what's the difference?Wide format:
np.nan
go
)fid.add_traces()
Example:
Long format:
px
)Example:
How to go from wide to long?
The two snippets below will produce the very same plot:
How to use px to plot long data?
How to use go to plot wide data?
By the looks of it,
go
is more complicated and offers perhaps more flexibility? Well, yes. And no. You can easily build a figure usingpx
and add anygo
object you'd like!Complete go snippet:
Complete px snippet: