Python – How to distribute a project with all its dependencies

dependenciesdistributionpython

We are developing a system for a customer that does not want to allow installation of packages from outside repositories. The project is in Python and defines its dependencies via setuptools; most of these dependencies are found on PyPI, and others are found on our company's repository. Some of them require system libraries to be present (e.g. libevent for gevent). None of them can be installed (as a direct download from the repository) in the customer's servers.

Right now, we are packaging the project, its dependencies, and recursively all dependencies of its dependencies, into RPMs, which we bundle into a single distribution tarball. This is time-consuming and error prone. Furthermore, we do not really need versioning, since the project is a service and client code does not get to choose which version of the service it talks to. We would just need to ship the latest version once we know it is stable.

The main alternative I have been considering is buildout: build the project in a staging machine with the same OS and interpreter as the production machine, then tar the whole directory and copy to the production machine. But I am not sure whether this would really be an improvement over the current distribution method.

What other options are there? Which one has been used successfully? Is there some kind of community best practice here?

Best Answer

There are two chief approaches to application distribution:

  • Isolating from the system/Sandboxing

    As much as possible is bundled, system APIs are possibly hooked and redirected into the sandbox. The app is installed into its own, separate directory hierarchy. In UNIX, /opt is for this.

    Pros:

    • Minimal dependency on the host environment and vice versa, only need to test the external interface in multiple environments (and not even it if it's provided by a 3rd party who is supposed to do that for you).

    Cons:

    • Can't take advantage of system components and updates to them, must duplicate and manage all 3rd-party components oneself
      • this includes things like user accounts, various types of data and security, automatic compatibility with other apps
    • Must update the app as a whole, regardless of the amount of changes, or write own private package manager
    • Tools for testing and debugging sandboxed apps are nigh-nonexistent, especially for virtualization instrumentations => hard to test the end result and diagnose customer problems
  • Integrating into the system

    The app uses the system's package manager and resides in the standard directory hierarchy.

    Pros:

    • Need to only manage the app's custom code and rely on the system-provided dependencies for the rest
    • Can reuse system's facilities and get free side-features from that including security and stability updates and the installation/updating itself
    • Can develop and update components independently

    Cons:

    • Potential dependency hell, interfaces must be clearly defined and integration-tested, potentially for all declared compatible combinations (for a given interface, so no combinatorial explosion but may be still pretty much).
    • The app looks like several packages for the user, can be confusing what is yours and what is system-provided and/or confusing to install/manage. Doubly so if the system package manager doesn't do dependency checking.
    • Lots of possible environments and/or supported environments are limited to those that have the necessary facilities.
      • 3rd-party modules and updates to them can cause unexpected problems, must be able to diagnose them (≈sources/debug symbols). This includes things like deprecation of interfaces you're using.

As such, apps of the 1st type tend to be used when the environment is perceived "hostile" (unpredictable/unreliable/uncooperative), and the customer is fine with paying the extra costs (both for you and for them) of working in one and/or with your equally uncooperative app.