Python – Writing Unicode text to a text file

character encodingpythonpython-2.xunicode

I'm pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a WordPress page).

It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source?

Currently I'm converting everything to Unicode on the way in, joining it all together in a Python string, then doing:

import codecs
f = codecs.open('out.txt', mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))

There is an encoding error on the last line:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
12286: ordinal not in range(128)

Partial solution:

This Python runs without an error:

row = [unicode(x.strip()) if x is not None else u'' for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open('out.txt', 'w')
f.write(all_html.encode("utf-8"))

But then if I open the actual text file, I see lots of symbols like:

Qur‚Äôan

Maybe I need to write to something other than a text file?

Best Answer

Deal exclusively with unicode objects as much as possible by decoding things to unicode objects when you first get them and encoding them as necessary on the way out.

If your string is actually a unicode object, you'll need to convert it to a unicode-encoded string object before writing it to a file:

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

When you read that file again, you'll get a unicode-encoded string that you can decode to a unicode object:

f = file('test', 'r')
print f.read().decode('utf8')

Related Solutions

Best way to convert text files between character sets

Stand-alone utility approach

iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt

-f ENCODING  the encoding of the input
-t ENCODING  the encoding of the output

You don't have to specify either of these arguments. They will default to your current locale, which is usually UTF-8.

Python – How to check whether a file exists without exceptions

If the reason you're checking is so you can do something like if file_exists: open_it(), it's safer to use a try around the attempt to open it. Checking and then opening risks the file being deleted or moved or something between when you check and when you try to open it.

If you're not planning to open the file immediately, you can use os.path.isfile

Return True if path is an existing regular file. This follows symbolic links, so both islink() and isfile() can be true for the same path.

import os.path
os.path.isfile(fname)

if you need to be sure it's a file.

Starting with Python 3.4, the pathlib module offers an object-oriented approach (backported to pathlib2 in Python 2.7):

from pathlib import Path

my_file = Path("/path/to/file")
if my_file.is_file():
    # file exists

To check a directory, do:

if my_file.is_dir():
    # directory exists

To check whether a Path object exists independently of whether is it a file or directory, use exists():

if my_file.exists():
    # path exists

You can also use resolve(strict=True) in a try block:

try:
    my_abs_path = my_file.resolve(strict=True)
except FileNotFoundError:
    # doesn't exist
else:
    # exists

Best Answer

Related Solutions

Best way to convert text files between character sets

Python – How to check whether a file exists without exceptions

Related Topic