Right now I'm importing a fairly large CSV
as a dataframe every time I run the script. Is there a good solution for keeping that dataframe constantly available in between runs so I don't have to spend all that time waiting for the script to run?
Python – How to reversibly store and load a Pandas dataframe to/from disk
dataframepandaspython
Related Topic
- Python – Selecting multiple columns in a Pandas dataframe
- Python – Use a list of values to select rows from a Pandas dataframe
- Python – Adding new column to existing DataFrame in Python pandas
- Python – Delete a column from a Pandas DataFrame
- Python – How to get the row count of a Pandas DataFrame
- Python – How to iterate over rows in a DataFrame in Pandas
- Python – How to select rows from a DataFrame based on column values
- Python – Get a list from Pandas DataFrame column headers
Best Answer
The easiest way is to pickle it using
to_pickle
:Then you can load it back using:
Note: before 0.11.1
save
andload
were the only way to do this (they are now deprecated in favor ofto_pickle
andread_pickle
respectively).Another popular choice is to use HDF5 (pytables) which offers very fast access times for large datasets:
More advanced strategies are discussed in the cookbook.
Since 0.13 there's also msgpack which may be be better for interoperability, as a faster alternative to JSON, or if you have python object/text-heavy data (see this question).