Python – Preferred way to expand a command line script to be used as a library in Python

argparsepython

I have a useful Python script that I've been invoking from the command line. It has decent number of options, maybe 20, and it's not unusual to run the script with six or seven flags. Then the rest of the input comes via stdin.

Now I have some other Python code from which I'd like to call this useful little utility. Two options I can think of are:

I can use subprocess.call and invoke my little script
(A little better) I can cobble together a command line and then pass it as a list of strings to argparse
I can totally refactor the program so that the entry point of my utility is Python function call and then have my command line utility just call this function. In principle this seems like the responsible thing to do, but it does leave my managing two separate interfaces to my function. For example, I have to decide whether I want to let argparse know the default values for my options or have the function know the defaults (or have two sets of defaults). Any validation I do using, say, ArgumentParser.add_mutually_exclusive_group will not apply when my tool is run as a library instead of a command line script.

Is there a standard paradigm for creating a single interface in Python that is well-suited to being invoked both from Python and from the command line?

Best Answer

If there is not a good reason to not do so, I would definitely advocate a spin on option 3. As @jonsharp mentions, breaking up your utility into clean units of functionality is a good way to ensure testability. Even the smallest scripts can eventually morph into a much larger program and making sure that you have an extensible API sooner rather than later will alleviate much headache down the road.

The way I'd approach this is:

Break up your code into logical methods with clean and clear I/O
Add unit tests. Having them is never a bad thing.
Rather than using if __name__ == '__main__', create a main() (or similar) method containing your entry point
Use setuptool's setup() function to define the script entry point in your setup.py file.

For example:

from setuptools import setup
setup(
    name='mypackage',
    version='0.1',
    entry_points={
        'console_scripts': [ 'myscript = mypackage.mymodule:main' ],
    }
)

Now, not only is all of your code (including main()) is easily unit testable, but you can still have your console entry point once you've done a python setup.py install|develop.

Any validation I do using, say, ArgumentParser.add_mutually_exclusive_group will not apply when my tool is run as a library instead of a command line script.

Depending on how your API is designed, you may need to add some extra validation to input parameters, but that should likely be there to prevent unexpected input anyways.

Edit: The only time I would use generally use subprocess is when I'm calling into a non-Python application or another Python script that I don't own or have the time to refactor, but the latter only being as a last resort. Most well-written Python utilities will expose both command line utilities and internal API.

Related Solutions

Python Database – How to Manage Database Connections in a Python Library Module

It really depends on the library you're using. Some of them could be closing the connection on their own (Note: I checked the builtin sqlite3 library, and it does not). Python will call a destructor when an object goes out of scope, and these libraries might implement a destructor that closes the connections gracefully.

However, that might not be the case! I would recommend, as others have in the comments, to wrap it in an object.

class MyDB(object):

    def __init__(self):
        self._db_connection = db_module.connect('host', 'user', 'password', 'db')
        self._db_cur = self._db_connection.cursor()

    def query(self, query, params):
        return self._db_cur.execute(query, params)

    def __del__(self):
        self._db_connection.close()

This will instantiate your database connection at the start, and close it when the place your object was instantiated falls out of scope. Note: If you instantiate this object at the module level, it will persist for your entire application. Unless this is intended, I would suggest separating your database functions from the non-database functions.

Luckily, python has standardized the Database API, so this will work with all of the compliant DBs for you :)

Python – Writing Functional and Integration Tests for Python

If your "whole" script is written as a set of functions including "main" function and is ending like

if __name__ == "__main__":
    main()

then all you need is actually call that main. If you have some command line parameters that are parsed, you can provide them as well.

As for REST calls - you use mocks. You either mock the actual call (if it's enough for you to just check that it was correct) or you set up a mock REST server and point your app to it.

For example, my app has an OpenID authentication. So I created my own simple OpenID server and point my application to use it as provider. That is rather slow, so I use it only when I run scenarios involving authentication. In all other cases I just patch the authentication method (I'm using tornado and I'm patching get_current_user to return a predefined user ID).

Best Answer

Related Solutions

Python Database – How to Manage Database Connections in a Python Library Module

Python – Writing Functional and Integration Tests for Python

Related Topic