Having written and then refactored a python "God object", I sympathize. What I did is break the original object down into sub sections based upon methods. For instance, the original looked like this pseudo code:
method A():
self.bla += 1
method B():
self.bla += 1
do stuff():
self.bla = 1
method A()
method B()
print self.bla
The stuff method is a self contained "unit" of work. I migrated it out to a new class which the original instantiates. This pulled out the necessary properties as well. Some were used only by the sub class and could move straight across. Others were shared, and got moved into a shared class.
The "God object" creates a new copy of the shared class at start up, and each of the new sub classes accepts a pointer as part of their init method. For example, here's a stripped version of the mailer:
#!/usr/bin/env python
# -*- coding: ascii -*-
'''Functions for emailing with dirMon.'''
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders
import os
import smtplib
import datetime
import logging
class mailer:
def __init__(self,SERVER="mail.server.com",FROM="support@server.com"):
self.server = SERVER
self.send_from = FROM
self.logger = logging.getLogger('dirMon.mailer')
def send_mail(self, send_to, subject, text, files=[]):
assert type(send_to)==list
assert type(files)==list
if self.logger.isEnabledFor(logging.DEBUG):
self.logger.debug(' '.join(("Sending email to:",' '.join(send_to))))
self.logger.debug(' '.join(("Subject:",subject)))
self.logger.debug(' '.join(("Text:",text)))
self.logger.debug(' '.join(("Files:",' '.join(files))))
msg = MIMEMultipart()
msg['From'] = self.send_from
msg['To'] = COMMASPACE.join(send_to)
msg['Date'] = formatdate(localtime=True)
msg['Subject'] = subject
msg.attach( MIMEText(text) )
for f in files:
part = MIMEBase('application', "octet-stream")
part.set_payload( open(f,"rb").read() )
Encoders.encode_base64(part)
part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(f))
msg.attach(part)
smtp = smtplib.SMTP(self.server)
mydict = smtp.sendmail(self.send_from, send_to, msg.as_string())
if self.logger.isEnabledFor(logging.DEBUG):
self.logger.debug("Email Successfully Sent!")
smtp.close()
return mydict
It is created once and shared between the different classes that need mailing capabilities.
So for you, create a class larry
with the properties and methods you need. Everywhere the client says larry = blah
replace it with larryObj.larry = blah
. This migrates things to sub projects without breaking the current interface.
The only other thing to do is look for "units of work". If you were going to turn part of the "God Object" into it's own method, do so. But, put the method outside it. This forces you to create an interface between the components.
Laying that groundwork allows everything else to follow it. For example, a piece of the helper object demonstrating how it interfaces with the mailer:
#!/usr/bin/env python
'''This module holds a class to spawn various subprocesses'''
import logging, os, subprocess, time, dateAdditionLib, datetime, re
class spawner:
def __init__(self, mailer):
self.logger = logging.getLogger('dirMon.spawner')
self.myMailer = mailer
Concentrate on the smallest individual unit of work possible, and move it out. This is easier to do, and lets you play with the setup quickly. Don't look at properties for moving stuff, they are ancillary to the tasks being done with them in most cases. Whatever is left over after you have dealt with the methods should probably stay in the original object, since it is part of the shared state.
But, the new objects should now accept the properties they need as init variables, not touching the caller objects property. They then return any necessary values, which the caller can use to update the shared properties as necessary. This helps to decouple the objects and makes for a more robust system.
The first question you need to answer is whether you intend to create a truly generic data access framework, or if you want it tailored to the particular project you're working on.
A generic framework has the benefit of being portable and fairly loosely coupled to the underlying DB + schema, but at the cost of pushing schema level information further up in the framework. The more generic you make the framework, the more of this knowledge you push up.
A tailored framework alleviates the higher levels of the application from worrying about the schema or DB details, but at the cost of cost of being less re-usable in other environments. As a corollary, the more specific you make the framework, the less easy it is to extend elsewhere.
If you're not sure on which of those to pick, the generic framework makes sense if you suspect there will be changes at the DB layer or the schema. Likewise, if those are pretty well locked down then the tailored framework is the better approach. Ultimately, you're optimizing the system to more easily handle where you think the change will occur.
As to whether or not you should refactor - the answer is No unless you're encountering a specific problem.
Worrying about the degree of coupling is nice, I suppose, but doesn't necessarily address a specific problem that you may be seeing. Refactoring based upon presumed issues within coupling is actually more likely to create a problem for you than just leaving it alone.
Likewise with expressing the state. If the current mechanism of an enum
is sufficient and doesn't present an apparent problem then don't change what you've got.
Overall, it sounds like you've designed a reasonably robust data access method for the domain you're working worth. Your question doesn't list any specific issues that are holding back your development, so I'd call it good and move to the next stage of development.
Best Answer
Having one function load from a file, and another load from JSON makes absolute sense. If your application finds itself in a situation where this is ambiguous -- and in multiple places -- then the third function can be added to remove the replication of logic determining which format it is: File or JSON.
Basically, you'll have all three where
load_project(file_or_json)
detects whether or not it's a file and calls the appropriate method:Benefits:
Centralized location of testing whether or not something is a file path or JSON
It's handy to use when the caller doesn't know the difference
You still have explicit functions when you do know the format
You can optimize the test. Instead of calling out to the file system, maybe you decide to do a quick Regex test to see if it is JSON (after all, how many file names will start with
{
or[
characters?)You still adhere to the Single Responsibility Principal, because the
load_project
function is responsible for determining the format. The other two functions are responsible for loading from the specific formats.