Python – Correctly Declaring Instance Attributes in __init__

python

According to Pycharm (and thus assume according by PEP) i should state all instance attributes directly inside __init__. In my case it does not seem suitable to me. I have something like:

class Handler(object):
    def __init__(self):
        self.start_index = # parse
        # Do stuff

    def parse(self):
        parse_header()
        # Do stuff
        parse_body()

    def parse_header():
        raise NotImplementedError

    def parse_body():
        raise NotImplementedError


class StringHandler(Handler):
    def parse_header():
        self.string_index = # parse
        self.string_count = # parse
        self.style_index = # parse
        self.style_count = # parse
        # More here

    def parse_body():
        self.strings = # parse
        self.styles = # parse
        # More here


class IntHandler(Handler):
    def parse_header():
        self.int_index = # parse
        self.int_count = # parse
        # More here

    def parse_body():
        self.ints = # parse
        # More here

Given that there are objects (quite a lot of them) that all derive from Handler class, the structure of each object it quite obvious for anyone who reads the code. To me it seems like unnecessary cluttering to declare __init__ just for sake of calling parent __init__ and declare couple of parameters as None. Usually parse-ing functions just parse whatever is needed and store that to attributes, thus adding such __init__ would sometimes almost double the size of the class (code-wise).

edit: I searched for this problem, but I did not found suitable answer, for example while searching stackexcahnge, the questions were either too broad to get useful answer ( How would you know if you've written readable and easily maintainable code? ) or asking different question ( Is it a good practice to declare instance variables as None in a class in Python? )

Best Answer

Yes, you should assign all attributes of your object in the __init__ method.

The most important reason to do this is tool support. Many Python IDEs can see the assignments in the __init__ and use those attributes for type-based autocompletion. Sphinx-doc can list those attributes in your reference documentation.

There's also a more fundamental reason: Simple is good, so don't create complicated objects that change their shape during execution. Given a StringHandler instance I may or may not be able to read the handler.strings attribute. If I try that before handler.parse() has been executed, I will get an AttributeError. Afterwards I would get the value. To avoid exceptions, consumers of your class would have to getattr(handler, 'strings', None) with a default value which is quite tedious.

In general, constructors should create fully-initialized objects that are immediately usable. In some cases it is necessary to create mutable objects that have an empty, default state, and later transition to a state where they contain a value. However, this should be the exception and is rarely a good design. See also Are init() methods a code smell? and Start Method vs. Setting up everything in constructor which discuss this issue for Java.

In your case the parse() method is such a two-phase initialization method that must be called before the object can really be used. This raises the questions:

  • What is the point of these objects, if they're just a container for whatever values the parse() function produces? As you are using them, you might as well use a dict instead of a class.

  • Is an empty Handler meaningful in any way?

  • Why is the initialization in parse() separate from the other initialization?

  • Or are you using those attributes only for communication between the parse_x functions, not for external consumers?

Possible alternative designs:

  • The handlers specify how a field should be parsed, but do not actually contain the value. The parse method returns the value. Because the handlers do not contain any data, it may not be necessary to use classes for this – simple functions could are sufficient.

  • The handlers represent a parsed value. There is no “empty” state. The parsing either happens in the constructor, or in separate functions. Because these objects are just dumb structs, you could define them via namedtuple() when parsing is separate.

Related Topic