Python: How to write a list to file and then pull it back into memory (dict represented as a string convert to dict) later

picklepython

More specific dupe of 875228—Simple data storing in Python.

I have a rather large dict (6 GB) and I need to do some processing on it. I'm trying out several document clustering methods, so I need to have the whole thing in memory at once. I have other functions to run on this data, but the contents will not change.

Currently, every time I think of new functions I have to write them, and then re-generate the dict. I'm looking for a way to write this dict to a file, so that I can load it into memory instead of recalculating all it's values.

to oversimplify things it looks something like:
{((('word','list'),(1,2),(1,3)),(…)):0.0, ….}

I feel that python must have a better way than me looping around through some string looking for : and ( trying to parse it into a dictionary.

Best Answer

Why not use python pickle? Python has a great serializing module called pickle it is very easy to use.

import cPickle
cPickle.dump(obj, open('save.p', 'wb')) 
obj = cPickle.load(open('save.p', 'rb'))

There are two disadvantages with pickle:

  • It's not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
  • The format is not human readable.

If you are using python 2.6 there is a builtin module called json. It is as easy as pickle to use:

import json
encoded = json.dumps(obj)
obj = json.loads(encoded)

Json format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. But might be slower than cPickle.