Python – Figuring out unicode: ‘ascii’ codec can’t decode

pythonunicode

I currently use Sublime 2 and run my python code there.
When I try to run this code. I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
6: ordinal not in range(128)

# -*- coding: utf-8 -*-  
s = unicode('abcdefö') 
print s

I have been reading the python documentation on unicode and as far as I understand this should work, or is it the console that's not working

Edit: Using s = u'abcdefö' as a string produces almost the same result. The result I get is

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in
position 6: ordinal not in range(128)

Best Answer

What happens is that unicode('abcdefö') tries to decode the encoded string to unicode during runtime. The coding: utf-8 line only tells Python that the source file is encoded in utf8. When the script runs it has been compiled and string has been stored as a encoded string. So when Python tries to decode the string it uses ascii by default. As the string is actually utf8 encoded this fails.

You can do s = u'abcdefö' which tells the compiler to decode the string with the encoding declared for the file and store it as unicode. s = unicode('abcdefö', 'utf8') or s = 'abcdefö'.decode('utf8') would do the same thing during runtime.

However does not necessarily mean that you can print s now. First the internal unicode string has to be encoded in a character set that the stdout (the console/editor/IDE) can actually display. Sadly often Python fails at figuring out the right character set and defaults to ascii again and you get an error when the string contains non-ascii characters. The Python Wiki knows a few ways to set up stdout properly.