Python – How references to variables are resolved in Python

pythonpython-2.7python-internalsscopevariables

This message is a a bit long with many examples, but I hope it
will help me and others to better grasp the full story of variables
and attribute lookup in Python 2.7.

I am using the terms of PEP 227
(http://www.python.org/dev/peps/pep-0227/) for code blocks (such as
modules, class definition, function definitions, etc.) and
variable bindings (such as assignments, argument declarations, class
and function declaration, for loops, etc.)

I am using the terms variables for names that can be called without a
dot, and attributes for names that need to be qualified with an object
name (such as obj.x for the attribute x of object obj).

There are three scopes in Python for all code blocks, but the functions:

  • Local
  • Global
  • Builtin

There are four blocks in Python for the functions only (according to
PEP 227):

  • Local
  • Enclosing functions
  • Global
  • Builtin

The rule for a variable to bind it to and find it in a block is
quite simple:

  • any binding of a variable to an object in a block makes this variable
    local to this block, unless the variable is declared global (in that
    case the variable belongs to the global scope)
  • a reference to a variable is looked up using the rule LGB (local,
    global, builtin) for all blocks, but the functions
  • a reference to a variable is looked up using the rule LEGB (local,
    enclosing, global, builtin) for the functions only.

Let me know take examples validating this rule, and showing many
special cases. For each example, I will give my understanding. Please
correct me if I am wrong. For the last example, I don't understand the
outcome.

example 1:

x = "x in module"
class A():
    print "A: "  + x                    #x in module
    x = "x in class A"
    print locals()
    class B():
        print "B: " + x                 #x in module
        x = "x in class B"
        print locals()
        def f(self):
            print "f: " + x             #x in module
            self.x = "self.x in f"
            print x, self.x
            print locals()

>>>A.B().f()
A: x in module
{'x': 'x in class A', '__module__': '__main__'}
B: x in module
{'x': 'x in class B', '__module__': '__main__'}
f: x in module
x in module self.x in f
{'self': <__main__.B instance at 0x00000000026FC9C8>}

There is no nested scope for the classes (rule LGB) and a function in
a class cannot access the attributes of the class without using a
qualified name (self.x in this example). This is well described in
PEP227.

example 2:

z = "z in module"
def f():
    z = "z in f()"
    class C():
        z = "z in C"
        def g(self):
            print z
            print C.z
    C().g()
f()
>>> 
z in f()
z in C

Here variables in functions are looked up using the LEGB rule, but if
a class is in the path, the class arguments are skipped. Here again,
this is what PEP 227 is explaining.

example 3:

var = 0
def func():
    print var
    var = 1
>>> func()

Traceback (most recent call last):
  File "<pyshell#102>", line 1, in <module>
func()
  File "C:/Users/aa/Desktop/test2.py", line 25, in func
print var
UnboundLocalError: local variable 'var' referenced before assignment

We expect with a dynamic language such as python that everything is
resolved dynamically. But this is not the case for functions. Local
variables are determined at compile time. PEP 227 and
http://docs.python.org/2.7/reference/executionmodel.html describe this
behavior this way

"If a name binding operation occurs anywhere within a code block, all
uses of the name within the block are treated as references to the
current block."

example 4:

x = "x in module"
class A():
    print "A: " + x
    x = "x in A"
    print "A: " + x
    print locals()
    del x
    print locals()
    print "A: " + x
>>> 
A: x in module
A: x in A
{'x': 'x in A', '__module__': '__main__'}
{'__module__': '__main__'}
A: x in module

But we see here that this statement in PEP227 "If a name binding
operation occurs anywhere within a code block, all uses of the name
within the block are treated as references to the current block." is
wrong when the code block is a class. Moreover, for classes, it seems
that local name binding is not made at compile time, but during
execution using the class namespace. In that respect,
PEP227 and the execution model in the Python doc is misleading and for
some parts wrong.

example 5:

x = 'x in module'
def  f2():
    x = 'x in f2'
    def myfunc():
        x = 'x in myfunc'
        class MyClass(object):
            x = x
            print x
        return MyClass
    myfunc()
f2()
>>> 
x in module

my understanding of this code is the following. The instruction x = x
first look up the object the right hand x of the expression is referring
to. In that case, the object is looked up locally in the class, then
following the rule LGB it is looked up in the global scope, which is
the string 'x in module'. Then a local attribute x to MyClass is
created in the class dictionary and pointed to the string object.

example 6:

Now here is an example I cannot explain.
It is very close to example 5, I am just changing the local MyClass
attribute from x to y.

x = 'x in module'
def  f2():
    x = 'x in f2'
    def myfunc():
        x = 'x in myfunc'
        class MyClass(object):
            y = x
            print y
        return MyClass
    myfunc()
f2()
>>>
x in myfunc

Why in that case the x reference in MyClass is looked up in the
innermost function?

Best Answer

In an ideal world, you'd be right and some of the inconsistencies you found would be wrong. However, CPython has optimized some scenarios, specifically function locals. These optimizations, together with how the compiler and evaluation loop interact and historical precedent, lead to the confusion.

Python translates code to bytecodes, and those are then interpreted by a interpreter loop. The 'regular' opcode for accessing a name is LOAD_NAME, which looks up a variable name as you would in a dictionary. LOAD_NAME will first look up a name as a local, and if that fails, looks for a global. LOAD_NAME throws a NameError exception when the name is not found.

For nested scopes, looking up names outside of the current scope is implemented using closures; if a name is not assigned to but is available in a nested (not global) scope, then such values are handled as a closure. This is needed because a parent scope can hold different values for a given name at different times; two calls to a parent function can lead to different closure values. So Python has LOAD_CLOSURE, MAKE_CLOSURE and LOAD_DEREF opcodes for that situation; the first two opcodes are used in loading and creating a closure for a nested scope, and the LOAD_DEREF will load the closed-over value when the nested scope needs it.

Now, LOAD_NAME is relatively slow; it will consult two dictionaries, which means it has to hash the key first and run a few equality tests (if the name wasn't interned). If the name isn't local, then it has to do this again for a global. For functions, that can potentially be called tens of thousands of times, this can get tedious fast. So function locals have special opcodes. Loading a local name is implemented by LOAD_FAST, which looks up local variables by index in a special local names array. This is much faster, but it does require that the compiler first has to see if a name is a local and not global. To still be able to look up global names, another opcode LOAD_GLOBAL is used. The compiler explicitly optimizes for this case to generate the special opcodes. LOAD_FAST will throw an UnboundLocalError exception when there is not yet a value for the name.

Class definition bodies on the other hand, although they are treated much like a function, do not get this optimization step. Class definitions are not meant to be called all that often; most modules create classes once, when imported. Class scopes don't count when nesting either, so the rules are simpler. As a result, class definition bodies do not act like functions when you start mixing scopes up a little.

So, for non-function scopes, LOAD_NAME and LOAD_DEREF are used for locals and globals, and for closures, respectively. For functions, LOAD_FAST, LOAD_GLOBAL and LOAD_DEREF are used instead.

Note that class bodies are executed as soon as Python executes the class line! So in example 1, class B inside class A is executed as soon as class A is executed, which is when you import the module. In example 2, C is not executed until f() is called, not before.

Lets walk through your examples:

  1. You have nested a class A.B in a class A. Class bodies do not form nested scopes, so even though the A.B class body is executed when class A is executed, the compiler will use LOAD_NAME to look up x. A.B().f() is a function (bound to the B() instance as a method), so it uses LOAD_GLOBAL to load x. We'll ignore attribute access here, that's a very well defined name pattern.

  2. Here f().C.z is at class scope, so the function f().C().g() will skip the C scope and look at the f() scope instead, using LOAD_DEREF.

  3. Here var was determined to be a local by the compiler because you assign to it within the scope. Functions are optimized, so LOAD_FAST is used to look up the local and an exception is thrown.

  4. Now things get a little weird. class A is executed at class scope, so LOAD_NAME is being used. A.x was deleted from the locals dictionary for the scope, so the second access to x results in the global x being found instead; LOAD_NAME looked for a local first and didn't find it there, falling back to the global lookup.

    Yes, this appears inconsistent with the documentation. Python-the-language and CPython-the implementation are clashing a little here. You are, however, pushing the boundaries of what is possible and practical in a dynamic language; checking if x should have been a local in LOAD_NAME would be possible but takes precious execution time for a corner case that most developers will never run into.

  5. Now you are confusing the compiler. You used x = x in the class scope, and thus you are setting a local from a name outside of the scope. The compiler finds x is a local here (you assign to it), so it never considers that it could also be a scoped name. The compiler uses LOAD_NAME for all references to x in this scope, because this is not an optimized function body.

    When executing the class definition, x = x first requires you to look up x, so it uses LOAD_NAME to do so. No x is defined, LOAD_NAME doesn't find a local, so the global x is found. The resulting value is stored as a local, which happens to be named x as well. print x uses LOAD_NAME again, and now finds the new local x value.

  6. Here you did not confuse the compiler. You are creating a local y, x is not local, so the compiler recognizes it as a scoped name from parent function f2().myfunc(). x is looked up with LOAD_DEREF from the closure, and stored in y.

You could see the confusion between 5 and 6 as a bug, albeit one that is not worth fixing in my opinion. It was certainly filed as such, see issue 532860 in the Python bug tracker, it has been there for over 10 years now.

The compiler could check for a scoped name x even when x is also a local, for that first assignment in example 5. Or LOAD_NAME could check if the name is meant to be a local, really, and throw an UnboundLocalError if no local was found, at the expense of more performance. Had this been in a function scope, LOAD_FAST would have been used for example 5, and an UnboundLocalError would be thrown immediately.

However, as the referenced bug shows, for historical reasons the behaviour is retained. There probably is code out there today that'll break were this bug fixed.