Python Refactoring – How to Refactor a ‘God Class’

designpythonrefactoring

Problem

I’m working on a Python project whose main class is a bit “God Object”. There are so friggin’ many attributes and methods!

I want to refactor the class.

So Far…

For the first step, I want to do something relatively simple; but when I tried the most straightforward approach, it broke some tests and existing examples.

Basically, the class has a loooong list of attributes—but I can clearly look over them and think, “These 5 attributes are related…These 8 are also related…and then there’s the rest.”

getattr

I basically just wanted to group the related attributes into a dict-like helper class. I had a feeling __getattr__ would be ideal for the job. So I moved the attributes to a separate class, and, sure enough, __getattr__ worked its magic perfectly well…

At first.

But then I tried running one of the examples. The example subclass tries to set one of these attributes directly (at the class level). But since the attribute was no longer “physically located” in the parent class, I got an error saying that the attribute did not exist.

@property

I then read up about the @property decorator. But then I also read that it creates problems for subclasses that want to do self.x = blah when x is a property of the parent class.

Desired

  • Have all client code continue to work using self.whatever, even if the parent’s whatever property is not “physically located” in the class (or instance) itself.
  • Group related attributes into dict-like containers.
  • Reduce the extreme noisiness of the code in the main class.

For example, I don’t simply want to change this:

larry = 2
curly = 'abcd'
moe   = self.doh()

Into this:

larry = something_else('larry')
curly = something_else('curly')
moe   = yet_another_thing.moe()

…because that’s still noisy. Although that successfully makes a simply attribute into something that can manage the data, the original had 3 variables and the tweaked version still has 3 variables.

However, I would be fine with something like this:

stooges = Stooges()

And if a lookup for self.larry fails, something would check stooges and see if larry is there. (But it must also work if a subclass tries to do larry = 'blah' at the class level.)

Summary

  • Want to replace related groups of attributes in a parent class with a single attribute that stores all the data elsewhere
  • Want to work with existing client code that uses (e.g.) larry = 'blah' at the class level
  • Want to continue to allow subclasses to extend, override, and modify these refactored attributes without knowing anything has changed

Is this possible? Or am I barking up the wrong tree?

Best Answer

Having written and then refactored a python "God object", I sympathize. What I did is break the original object down into sub sections based upon methods. For instance, the original looked like this pseudo code:

method A():
    self.bla += 1

method B():
    self.bla += 1

do stuff():
    self.bla = 1
    method A()
    method B()
    print self.bla

The stuff method is a self contained "unit" of work. I migrated it out to a new class which the original instantiates. This pulled out the necessary properties as well. Some were used only by the sub class and could move straight across. Others were shared, and got moved into a shared class.

The "God object" creates a new copy of the shared class at start up, and each of the new sub classes accepts a pointer as part of their init method. For example, here's a stripped version of the mailer:

#!/usr/bin/env python
# -*- coding: ascii -*-
'''Functions for emailing with dirMon.'''

from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders
import os
import smtplib
import datetime
import logging

class mailer:
    def __init__(self,SERVER="mail.server.com",FROM="support@server.com"):
        self.server = SERVER
        self.send_from = FROM
        self.logger = logging.getLogger('dirMon.mailer')

    def send_mail(self, send_to, subject, text, files=[]):
        assert type(send_to)==list
        assert type(files)==list
        if self.logger.isEnabledFor(logging.DEBUG):
            self.logger.debug(' '.join(("Sending email to:",' '.join(send_to))))
            self.logger.debug(' '.join(("Subject:",subject)))
            self.logger.debug(' '.join(("Text:",text)))
            self.logger.debug(' '.join(("Files:",' '.join(files))))
        msg = MIMEMultipart()
        msg['From'] = self.send_from
        msg['To'] = COMMASPACE.join(send_to)
        msg['Date'] = formatdate(localtime=True)
        msg['Subject'] = subject
        msg.attach( MIMEText(text) )
        for f in files:
            part = MIMEBase('application', "octet-stream")
            part.set_payload( open(f,"rb").read() )
            Encoders.encode_base64(part)
            part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(f))
            msg.attach(part)
        smtp = smtplib.SMTP(self.server)
        mydict = smtp.sendmail(self.send_from, send_to, msg.as_string())
        if self.logger.isEnabledFor(logging.DEBUG):
            self.logger.debug("Email Successfully Sent!")
        smtp.close()
        return mydict

It is created once and shared between the different classes that need mailing capabilities.

So for you, create a class larry with the properties and methods you need. Everywhere the client says larry = blah replace it with larryObj.larry = blah. This migrates things to sub projects without breaking the current interface.

The only other thing to do is look for "units of work". If you were going to turn part of the "God Object" into it's own method, do so. But, put the method outside it. This forces you to create an interface between the components.

Laying that groundwork allows everything else to follow it. For example, a piece of the helper object demonstrating how it interfaces with the mailer:

#!/usr/bin/env python
'''This module holds a class to spawn various subprocesses'''
import logging, os, subprocess, time, dateAdditionLib, datetime, re

class spawner:
    def __init__(self, mailer):
        self.logger = logging.getLogger('dirMon.spawner')
        self.myMailer = mailer

Concentrate on the smallest individual unit of work possible, and move it out. This is easier to do, and lets you play with the setup quickly. Don't look at properties for moving stuff, they are ancillary to the tasks being done with them in most cases. Whatever is left over after you have dealt with the methods should probably stay in the original object, since it is part of the shared state.

But, the new objects should now accept the properties they need as init variables, not touching the caller objects property. They then return any necessary values, which the caller can use to update the shared properties as necessary. This helps to decouple the objects and makes for a more robust system.

Related Topic