Sample code:
>>> import json
>>> json_string = json.dumps("ברי צקלה")
>>> print(json_string)
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"
The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I’d rather not use XML).
Is there a way to serialize objects into UTF-8 JSON strings (instead of \uXXXX
)?
Best Answer
Use the
ensure_ascii=False
switch tojson.dumps()
, then encode the value to UTF-8 manually:If you are writing to a file, just use
json.dump()
and leave it to the file object to encode:Caveats for Python 2
For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use
io.open()
instead ofopen()
to produce a file object that encodes Unicode values for you as you write, then usejson.dump()
instead to write to that file:Do note that there is a bug in the
json
module where theensure_ascii=False
flag can produce a mix ofunicode
andstr
objects. The workaround for Python 2 then is:In Python 2, when using byte strings (type
str
), encoded to UTF-8, make sure to also set theencoding
keyword: