Python – Correctly Declaring Instance Attributes in init

python

According to Pycharm (and thus assume according by PEP) i should state all instance attributes directly inside __init__. In my case it does not seem suitable to me. I have something like:

class Handler(object):
    def __init__(self):
        self.start_index = # parse
        # Do stuff

    def parse(self):
        parse_header()
        # Do stuff
        parse_body()

    def parse_header():
        raise NotImplementedError

    def parse_body():
        raise NotImplementedError


class StringHandler(Handler):
    def parse_header():
        self.string_index = # parse
        self.string_count = # parse
        self.style_index = # parse
        self.style_count = # parse
        # More here

    def parse_body():
        self.strings = # parse
        self.styles = # parse
        # More here


class IntHandler(Handler):
    def parse_header():
        self.int_index = # parse
        self.int_count = # parse
        # More here

    def parse_body():
        self.ints = # parse
        # More here

Given that there are objects (quite a lot of them) that all derive from Handler class, the structure of each object it quite obvious for anyone who reads the code. To me it seems like unnecessary cluttering to declare __init__ just for sake of calling parent __init__ and declare couple of parameters as None. Usually parse-ing functions just parse whatever is needed and store that to attributes, thus adding such __init__ would sometimes almost double the size of the class (code-wise).

edit: I searched for this problem, but I did not found suitable answer, for example while searching stackexcahnge, the questions were either too broad to get useful answer ( How would you know if you've written readable and easily maintainable code? ) or asking different question ( Is it a good practice to declare instance variables as None in a class in Python? )

Best Answer

Yes, you should assign all attributes of your object in the __init__ method.

The most important reason to do this is tool support. Many Python IDEs can see the assignments in the __init__ and use those attributes for type-based autocompletion. Sphinx-doc can list those attributes in your reference documentation.

There's also a more fundamental reason: Simple is good, so don't create complicated objects that change their shape during execution. Given a StringHandler instance I may or may not be able to read the handler.strings attribute. If I try that before handler.parse() has been executed, I will get an AttributeError. Afterwards I would get the value. To avoid exceptions, consumers of your class would have to getattr(handler, 'strings', None) with a default value which is quite tedious.

In general, constructors should create fully-initialized objects that are immediately usable. In some cases it is necessary to create mutable objects that have an empty, default state, and later transition to a state where they contain a value. However, this should be the exception and is rarely a good design. See also Are init() methods a code smell? and Start Method vs. Setting up everything in constructor which discuss this issue for Java.

In your case the parse() method is such a two-phase initialization method that must be called before the object can really be used. This raises the questions:

What is the point of these objects, if they're just a container for whatever values the parse() function produces? As you are using them, you might as well use a dict instead of a class.
Is an empty Handler meaningful in any way?
Why is the initialization in parse() separate from the other initialization?
Or are you using those attributes only for communication between the parse_x functions, not for external consumers?

Possible alternative designs:

The handlers specify how a field should be parsed, but do not actually contain the value. The parse method returns the value. Because the handlers do not contain any data, it may not be necessary to use classes for this – simple functions could are sufficient.
The handlers represent a parsed value. There is no “empty” state. The parsing either happens in the constructor, or in separate functions. Because these objects are just dumb structs, you could define them via namedtuple() when parsing is separate.

Related Solutions

Is Declaring Instance Variables as None in Python Good Practice?

I call the latter bad practice under the "this does not do what you think it does" rule.

Your coworker's position can be rewritten as: "I am going to create a bunch of class-static quasi-global variables which are never accessed, but which do take up space in the various class's namespace tables (__dict__), just to make my IDE do something."

Python – Assigning Instance Variables in `init` vs Called Function

There is no official place to register all instance variables in Python, nor can there be, since one can add new instance variables dynamically, at any point in the object's lifetime.

Still, it's nice to have a single place to look (and for IDEs to look, as @RemcoGerlich says) to identify what instance variables are typically in play. My solution is to initialize all common instance variables in __init__, even if just to a dummy None value that will be quickly overwritten. This avoids the uglier multiple value assignment, yet gives humans and IDEs alike a single place to look for an overview of values.

class foo(object):
    def __init__(self, bar, bar2):
        self.bar = bar
        self.baz = None
        self.square_baz = None
        self.func(bar2)

    def func(self, bar2):
        self.baz = self.bar + bar2
        self.square_baz = bar2 * bar2

This example uses a standard, "public" method name func as the initializer. If func would normally be called from outside the class, this makes sense. Often, however, the function called from __init__ will be private (usually called only by the class, and either just once or infrequently). In that case, you can give further clarity by giving it a more indicative name, and by using an underscore prefix to the method name, which conventionally means "this is a private method." So I might call it _initialize or _finish_initialization for example. (Technically speaking, Python doesn't have "private methods," at least not in the enforced-scope way languages like Java do. Private methods are managed by convention/idiom, even if unenforced.)

Best Answer

Related Solutions

Is Declaring Instance Variables as None in Python Good Practice?

Python – Assigning Instance Variables in `__init__` vs Called Function

Related Topic

Python – Assigning Instance Variables in `init` vs Called Function