Python – How to refactor functions

functionspythonrefactoringsingle-responsibility

Let's say I want to have a function to load a project's data from both a json and a file path, the first idea which comes to my mind would be having a couple of functions such as:

def load_project_from_json(json_data):
    ...

def load_project_from_path(path):
    with open(path) as f:
        load_project_from_json(f.read())

but then I was asking myself if maybe would be more convenient to achieve both tasks with just a single function, for instance:

def load_project(path_or_json)

Many interrogants come to my mind, for instance:

how do you know when it's convenient to explode a function into multiple ones or not?
how can i judge whether a function is a "good" function or not?
is it the function following SRP?

Question: Which systematic method (set of steps or rules) could help me out to extract all responsabilities from any function? The main reason is learning how to judge the quality of any function (no matter if we're talking about free functions or class methods). That way I'll be able to say whether it's necessary or not to refactor it.

NS: When I say systematic method think in terms of simplifying equations, if you follow a set of rules you'll be able to reach a point where the equation can't be simplified further, similarly I'd like to know which rules I could apply to know when a function can't be refactored any further. I hope the analogy makes sense

Best Answer

Having one function load from a file, and another load from JSON makes absolute sense. If your application finds itself in a situation where this is ambiguous -- and in multiple places -- then the third function can be added to remove the replication of logic determining which format it is: File or JSON.

Basically, you'll have all three where load_project(file_or_json) detects whether or not it's a file and calls the appropriate method:

def load_project(path_or_json)
  if os.path.exists(path_or_json):
     return load_project_from_path(path_or_json)
  else:
     return load_project_from_json(path_or_json)

Benefits:

Centralized location of testing whether or not something is a file path or JSON
It's handy to use when the caller doesn't know the difference
You still have explicit functions when you do know the format
You can optimize the test. Instead of calling out to the file system, maybe you decide to do a quick Regex test to see if it is JSON (after all, how many file names will start with { or [ characters?)
You still adhere to the Single Responsibility Principal, because the load_project function is responsible for determining the format. The other two functions are responsible for loading from the specific formats.

Related Solutions

Python Refactoring – How to Refactor a ‘God Class’

Having written and then refactored a python "God object", I sympathize. What I did is break the original object down into sub sections based upon methods. For instance, the original looked like this pseudo code:

method A():
    self.bla += 1

method B():
    self.bla += 1

do stuff():
    self.bla = 1
    method A()
    method B()
    print self.bla

The stuff method is a self contained "unit" of work. I migrated it out to a new class which the original instantiates. This pulled out the necessary properties as well. Some were used only by the sub class and could move straight across. Others were shared, and got moved into a shared class.

The "God object" creates a new copy of the shared class at start up, and each of the new sub classes accepts a pointer as part of their init method. For example, here's a stripped version of the mailer:

#!/usr/bin/env python
# -*- coding: ascii -*-
'''Functions for emailing with dirMon.'''

from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
from email import Encoders
import os
import smtplib
import datetime
import logging

class mailer:
    def __init__(self,SERVER="mail.server.com",FROM="support@server.com"):
        self.server = SERVER
        self.send_from = FROM
        self.logger = logging.getLogger('dirMon.mailer')

    def send_mail(self, send_to, subject, text, files=[]):
        assert type(send_to)==list
        assert type(files)==list
        if self.logger.isEnabledFor(logging.DEBUG):
            self.logger.debug(' '.join(("Sending email to:",' '.join(send_to))))
            self.logger.debug(' '.join(("Subject:",subject)))
            self.logger.debug(' '.join(("Text:",text)))
            self.logger.debug(' '.join(("Files:",' '.join(files))))
        msg = MIMEMultipart()
        msg['From'] = self.send_from
        msg['To'] = COMMASPACE.join(send_to)
        msg['Date'] = formatdate(localtime=True)
        msg['Subject'] = subject
        msg.attach( MIMEText(text) )
        for f in files:
            part = MIMEBase('application', "octet-stream")
            part.set_payload( open(f,"rb").read() )
            Encoders.encode_base64(part)
            part.add_header('Content-Disposition', 'attachment; filename="%s"' % os.path.basename(f))
            msg.attach(part)
        smtp = smtplib.SMTP(self.server)
        mydict = smtp.sendmail(self.send_from, send_to, msg.as_string())
        if self.logger.isEnabledFor(logging.DEBUG):
            self.logger.debug("Email Successfully Sent!")
        smtp.close()
        return mydict

It is created once and shared between the different classes that need mailing capabilities.

So for you, create a class larry with the properties and methods you need. Everywhere the client says larry = blah replace it with larryObj.larry = blah. This migrates things to sub projects without breaking the current interface.

The only other thing to do is look for "units of work". If you were going to turn part of the "God Object" into it's own method, do so. But, put the method outside it. This forces you to create an interface between the components.

Laying that groundwork allows everything else to follow it. For example, a piece of the helper object demonstrating how it interfaces with the mailer:

#!/usr/bin/env python
'''This module holds a class to spawn various subprocesses'''
import logging, os, subprocess, time, dateAdditionLib, datetime, re

class spawner:
    def __init__(self, mailer):
        self.logger = logging.getLogger('dirMon.spawner')
        self.myMailer = mailer

Concentrate on the smallest individual unit of work possible, and move it out. This is easier to do, and lets you play with the setup quickly. Don't look at properties for moving stuff, they are ancillary to the tasks being done with them in most cases. Whatever is left over after you have dealt with the methods should probably stay in the original object, since it is part of the shared state.

But, the new objects should now accept the properties they need as init variables, not touching the caller objects property. They then return any necessary values, which the caller can use to update the shared properties as necessary. This helps to decouple the objects and makes for a more robust system.

Java – Thoughts on Refactoring a Generic DAO

The first question you need to answer is whether you intend to create a truly generic data access framework, or if you want it tailored to the particular project you're working on.

A generic framework has the benefit of being portable and fairly loosely coupled to the underlying DB + schema, but at the cost of pushing schema level information further up in the framework. The more generic you make the framework, the more of this knowledge you push up.

A tailored framework alleviates the higher levels of the application from worrying about the schema or DB details, but at the cost of cost of being less re-usable in other environments. As a corollary, the more specific you make the framework, the less easy it is to extend elsewhere.

If you're not sure on which of those to pick, the generic framework makes sense if you suspect there will be changes at the DB layer or the schema. Likewise, if those are pretty well locked down then the tailored framework is the better approach. Ultimately, you're optimizing the system to more easily handle where you think the change will occur.

As to whether or not you should refactor - the answer is No unless you're encountering a specific problem.

Worrying about the degree of coupling is nice, I suppose, but doesn't necessarily address a specific problem that you may be seeing. Refactoring based upon presumed issues within coupling is actually more likely to create a problem for you than just leaving it alone.

Likewise with expressing the state. If the current mechanism of an enum is sufficient and doesn't present an apparent problem then don't change what you've got.

Overall, it sounds like you've designed a reasonably robust data access method for the domain you're working worth. Your question doesn't list any specific issues that are holding back your development, so I'd call it good and move to the next stage of development.

Best Answer

Related Solutions

Python Refactoring – How to Refactor a ‘God Class’

Java – Thoughts on Refactoring a Generic DAO

Related Topic