Having written and then refactored a python "God object", I sympathize. What I did is break the original object down into sub sections based upon methods. For instance, the original looked like this pseudo code:
method A():
self.bla += 1
method B():
self.bla += 1
do stuff():
self.bla = 1
method A()
method B()
print self.bla
The stuff method is a self contained "unit" of work. I migrated it out to a new class which the original instantiates. This pulled out the necessary properties as well. Some were used only by the sub class and could move straight across. Others were shared, and got moved into a shared class.
The "God object" creates a new copy of the shared class at start up, and each of the new sub classes accepts a pointer as part of their init method. For example, here's a stripped version of the mailer:
#!/usr/bin/env python
# -*- coding: ascii -*-
'''Functions for emailing with dirMon.'''
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders
import os
import smtplib
import datetime
import logging
class mailer:
def __init__(self,SERVER="mail.server.com",FROM="support@server.com"):
self.server = SERVER
self.send_from = FROM
self.logger = logging.getLogger('dirMon.mailer')
def send_mail(self, send_to, subject, text, files=[]):
assert type(send_to)==list
assert type(files)==list
if self.logger.isEnabledFor(logging.DEBUG):
self.logger.debug(' '.join(("Sending email to:",' '.join(send_to))))
self.logger.debug(' '.join(("Subject:",subject)))
self.logger.debug(' '.join(("Text:",text)))
self.logger.debug(' '.join(("Files:",' '.join(files))))
msg = MIMEMultipart()
msg['From'] = self.send_from
msg['To'] = COMMASPACE.join(send_to)
msg['Date'] = formatdate(localtime=True)
msg['Subject'] = subject
msg.attach( MIMEText(text) )
for f in files:
part = MIMEBase('application', "octet-stream")
part.set_payload( open(f,"rb").read() )
Encoders.encode_base64(part)
part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(f))
msg.attach(part)
smtp = smtplib.SMTP(self.server)
mydict = smtp.sendmail(self.send_from, send_to, msg.as_string())
if self.logger.isEnabledFor(logging.DEBUG):
self.logger.debug("Email Successfully Sent!")
smtp.close()
return mydict
It is created once and shared between the different classes that need mailing capabilities.
So for you, create a class larry
with the properties and methods you need. Everywhere the client says larry = blah
replace it with larryObj.larry = blah
. This migrates things to sub projects without breaking the current interface.
The only other thing to do is look for "units of work". If you were going to turn part of the "God Object" into it's own method, do so. But, put the method outside it. This forces you to create an interface between the components.
Laying that groundwork allows everything else to follow it. For example, a piece of the helper object demonstrating how it interfaces with the mailer:
#!/usr/bin/env python
'''This module holds a class to spawn various subprocesses'''
import logging, os, subprocess, time, dateAdditionLib, datetime, re
class spawner:
def __init__(self, mailer):
self.logger = logging.getLogger('dirMon.spawner')
self.myMailer = mailer
Concentrate on the smallest individual unit of work possible, and move it out. This is easier to do, and lets you play with the setup quickly. Don't look at properties for moving stuff, they are ancillary to the tasks being done with them in most cases. Whatever is left over after you have dealt with the methods should probably stay in the original object, since it is part of the shared state.
But, the new objects should now accept the properties they need as init variables, not touching the caller objects property. They then return any necessary values, which the caller can use to update the shared properties as necessary. This helps to decouple the objects and makes for a more robust system.
Units
I think I can pinpoint exactly where the problem started:
I figured, I'll need a method that finds all the non-numerical fields in a line.
This should be immediately followed with asking yourself "Will that be a separate testable unit to gatherNonNumericColumns
or part of the same one?"
If the answer is "yes, separate", then your course of action is simple: that method needs to be public on an appropriate class, so it can be tested as a unit. Your mentality is something like "I need to test drive out one method and I also need to test drive out another method"
From what you say though, you figured that the answer is "no, part of the same". At this point, your plan should no longer be to fully write and test findNonNumericFields
then write gatherNonNumericColumns
. Instead, it should be simply to write gatherNonNumericColumns
. For now, findNonNumericFields
should just be a likely part of the destination you have in mind when you're choosing your next red test case and doing your refactoring. This time your mentality is "I need to test drive out one method, and while I do so I should keep in mind that my finished implementation will probably include this other method".
Keeping a short cycle
Doing the above should not lead to the problems you describe in your penultimate paragraph:
Because if I start out by writing a test for a public method, there will be several minutes (or hours, or even days in very complex cases) before I get all the details in the private methods to work so that the test testing the public method passes.
At no point does this technique require you to write a red test which will only turn green when you implement the entirety of findNonNumericFields
from scratch. Much more likely, findNonNumericFields
will start as some code in-line in the public method you're testing, which will be built up over the course of several cycles and eventually extracted during a refactoring.
Roadmap
To give an approximate roadmap for this particular example, I don't know the exact test cases you used, but say you were writing gatherNonNumericColumns
as your public method. Then most likely the test cases would be the same as the ones you wrote for findNonNumericFields
, each one using a table with only one row. When that one-row scenario was fully implemented, and you wanted to write a test to force you to extract out the method, you'd write a two-row case which would require you to add your iteration.
Best Answer
No. There are several reasons why:
And so on.