Python – Shell commands in bash or python? How much encapsulation is too much

I'm thinking about how to decide whether it's better to encapsulate my work behind well-named function names, or to expose it – which will help developers understand what's going on more quickly? Is there a name for the study of this sort of problem?

Specifically, if I'm running a bunch of bash commands ultimately, but I have significantly complex logic around those commands, at what point does it make sense to write this in a high-level language like Python, even though this obfuscates the actual bash commands being run?

Detailed problem

Currently I'm trying to write a Jenkins build script for my project with roughly the following steps:

Pull my code from github
Compile sass files into CSS
Pull down a sub-folder from a different github project
Zip up the project
Upload it to an object store with a unique ID

I'm thinking about how to write this to be as easy for future developers as possible (this code is never going to be seen by end users). These developers are likely, but not definitely, going to be fairly good at Python. They will definitely have a passing familiarity with the command-line, but are likely to be unfamiliar with more complex bash scripting.

The first iteration of this build script was just a list of sequential commands, something like:

git clone git@github.com:username/project.git
git clone git@github.com:username/sub-project.git project/sub-project
sass --update project/css
tar -czf project.tgz project
swift upload my-container project.tgz --object-name=project-`sha1sum project.tgz`.tgz

However, this set of commands quickly became more complex as I started to do things like only clone the git project if it wasn't already there, otherwise update it – to speed up the build. Before I knew it I had 50 lines and a fair few conditionals.

So the first thing I did was encapsulate these into bash functions, e.g. update_git_dir, so my build script looks more like this:

#!/usr/bin/env bash

source helper_functions.sh

update_git_dir project git@github.com:username/project.git
build_sass project/css
create_archive project project.tgz
upload_to_swift project.tgz

This is one level of encapsulation. Now the developer, who would have understood the git clone etc. commands directly, can't actually see what's going on. They have to look in helper_functions.sh.

However, as time went on I realised that many of my helper functions now consisted of more conditional statements, variable assignments and function calls than actual commands. These conditional statements can be quite opaque to someone not familiar with bash scripting:

function create_archive {
    project_name=${1}
    archive_filename=${2}

    # Get revision ids
    dependencies_requirements_revision=$(cat ${project_name}/sub-project/requirements-revision.txt)

    requirements_context=${project_name}/${requirements_file}
    requirements_dir=$(dirname ${requirements_context})
    if [ "${requirements_dir}" != "${project_name}" ]; then
        requirements_context=${requirements_dir}
    fi
    latest_revision=$(git-revision-hash ${project_name})

    ...

So I started migrating my code into Python. So now my build script looks like this:

#!/usr/bin/env python

from builders import GitProjectBuilder

builder = GitProjectBuilder(
    project_name='my-project',
    swift_container='my-container',
    git_repository='git@github.com:username/project.git',
    sub_project='git@github.com:username/sub-project.git'
)

# Compress and upload
builder.build_sass(directory='css')
builder.get_sub_project(repo='git@github.com:username/sub-project.git')
builder.build_archive(name='archive.tgz')
upload_location = builder.upload_archive_to_swift(archive='archive.tgz')
print upload_location

Now, when you look in builders.py, it's much easier to understand the logic – if statements and function calls are much more readable – but now we're even further away from the real shell commands. In my python code the closest I get to directly running shell commands looks like this:

def build_archive(self, archive):
    print subprocess.check_output(
        (
            'tar --exclude-vcs --create --file '
            '{archive_filename}.tar {project_dir}'
        ).format(
            archive_filename=archive_filename,
            project_dir=self.project_name
        ).split()
    )

If the developer needs to work out exactly which commands are being run, it's now much more difficult.

Wrap up

So how do I decide which is the best architecture to maximise transparency while encapsulating complexity?

This problem seems similar to when I'm working with dependency injection where the more dependencies I inject rather than encapsulate, the more complex my initialisation code gets – and I have a similar problem drawing the line.

Is there a name for this field of study?

Best Answer

I'd give xonsh a go, it's a clever mix of shell and python.

xonsh is a Python-ish, BASHwards-compatible shell language and command prompt. The language is a superset of Python 3.4 with additional shell primitives that you are used to from BASH and IPython. xonsh is meant for the daily use of experts and novices alike.

Take advantage of Python(3)'s abstraction and package system, coupled with nice conditionals, but write what needs to be in shell as just shell.

e.g.,

#!/usr/bin/env xonsh

def exists(filename):
    return filename in $(ls)

if exists(".git"):
    git checkout master
    git pull
else:
    git clone $GITURL

Note that only a little bit of ugliness $() is required to in-line shell inside of python, and it Just Works (TM) if you are splitting things clearly by line (eg the if statement lines)

Lots more detail (including embedding python in shell lines with @() ) in the tutorial http://xonsh.org/tutorial.html

You can use it as your system shell. But just because you can doesn't mean you should :-)

Detailed problem

Wrap up

Best Answer

Related Solutions

Use a Dockerfile instead of a batch script

Python – Options to handle large (multi-gigabyte) file uploads

Related Topic