Python – UnicodeDecodeError when using socket.gethostname() result

pythonunicode

Some of my users report that the following code may raise a UnicodeDecodeError when the hostname contains non-ascii characters (however I haven't been able to replicate this on my Windows Vista machine):

    self.path = path
    self.lock_file = os.path.abspath(path) + ".lock"
    self.hostname = socket.gethostname()
    self.pid = os.getpid()
    dirname = os.path.dirname(self.lock_file)
    self.unique_name = os.path.join(dirname, "%s.%s" % (self.hostname, self.pid))

The last part of the traceback is:

    File "taskcoachlib\thirdparty\lockfile\lockfile.pyo", line 537, in FileLock
    File "taskcoachlib\thirdparty\lockfile\lockfile.pyo", line 296, in __init__
    File "taskcoachlib\thirdparty\lockfile\lockfile.pyo", line 175, in __init__
    File "ntpath.pyo", line 102, in join
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 7: ordinal not in range(128)

Any ideas on why and how to prevent it?

(The exception occurs with Python 2.5 on Windows XP)

Best Answer

I don't think gethostname() is necessarily giving you a unicode object. It could be the directory name of lockfile. Regardless, one of them is a standard string with a non-ASCII (higher than 127) char in it and the other is a unicode string.

The problem is that the join function in the ntpath module (the module Python uses for os.path on Windows) attempts join the arguments given. This causes Python to try to convert the normal string parts to unicode. In your case the non-unicode string appears to have a non-ASCII character. This can't be reliably converted to unicode, so Python raises the exception.

A simpler way to trigger the problem:

>> from ntpath import join
>> join(u'abc', '\xff')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)

/home/msmits/<ipython console> in <module>()

/usr/lib64/python2.6/ntpath.pyc in join(a, *p)
    106                     path += b
    107                 else:
--> 108                     path += "\\" + b
    109             else:
    110                 # path is not empty and does not end with a backslash,

The traceback shows the problem line in ntpath.py.

You could work around this by using converting the args to join() to standard strings first as other answers suggest. Alternatively you could convert everything to unicode first. If a specific encoding is given to decode() high bytes can be converted to unicode.

For example:

>> '\xff'.decode('latin-1')
u'\xff'

Related Solutions

Python – Calling a function of a module by using its name (a string)

Assuming module foo with method bar:

import foo
method_to_call = getattr(foo, 'bar')
result = method_to_call()

You could shorten lines 2 and 3 to:

result = getattr(foo, 'bar')()

if that makes more sense for your use case.

You can use getattr in this fashion on class instance bound methods, module-level methods, class methods... the list goes on.

Python – Using global variables in a function

You can use a global variable within other functions by declaring it as global within each function that assigns a value to it:

globvar = 0

def set_globvar_to_one():
    global globvar    # Needed to modify global copy of globvar
    globvar = 1

def print_globvar():
    print(globvar)     # No need for global declaration to read value of globvar

set_globvar_to_one()
print_globvar()       # Prints 1

Since global variables have a long history of introducing bugs (in every programming language), Python wants to make sure that you understand the risks by forcing you to explicitly use the global keyword.

See other answers if you want to share a global variable across modules.

Best Answer

Related Solutions

Python – Calling a function of a module by using its name (a string)

Python – Using global variables in a function

Related Topic