python restricted environment – How to Manage Python in a Restricted Environment

python

The Need:

  • Support several hundred Python developers and/or prod servers running Python code in a highly restrictive environment.
  • Be able to provide any compatible module found in PyPi.org that a developer needs.

Environment:

  • No external access.
  • Internal network available.
  • Support multiple platforms (Windows, Linux, Apple)
  • Good chunk of developers and/or prod servers do not have access to compiling tools.
  • At minimum, supports latest Python 2.7 and Python 3.x

The Ask:

  • How does one provide support for the distribution of installing Python modules?
  • How does one deal with those Python modules requiring compilation? Remember, many boxes will not have the compile tools available.

Def appreciate solutions based on similar real world experiences.

Assumptions:

  • Assume a magical process exists which authorizes modules to be pulled into the internal network for distribution.
  • Not that Anaconda can’t be a part of the answer, just be sure to address how you would work around PyPi.org packages not found there.

Clarifications:

  • Docker containers are allowed.

Best Answer

Preface

Nowadays, there are lots of viable options if you want to host an own PyPI repository. There are many packages available that implement a PyPI repo server, most notable being:

There are also some other, more or less exotic packages like PyPICloud that uploads package files directly to Amazon S3 instance. JFrog's Artifactory also supports serving python packages, although not in free edition afaik so it only makes sense if you're already paying for a license. You can even create a local PyPI repo with using nothing but the python's stdlib, see my answer on SO.

Also, this topic was discussed several times on SO, with most popular questions being How to roll my own pypi? and how to create local own pypi repository index without mirror? Beware that the first question is rather old and contains mostly outdated answers, the second one being more up to date.

devpi

At my work, we evaluated the available solutions two years ago and are sticking with devpi since. Developed by the same people that are behind the popular testing framework pytest and CI tasks automation tool tox, devpi is a versatile tool that:

  • can host multiple repositories (called indexes), allowing you to group package access;
  • acts as a PyPI mirror by default, this can be turned off on demand;
  • provides a role based access control for uploading packages;
  • offers an optional web UI that can be customized via page templating;
  • offers master server replication - all replicas will automatically synchronize the package base from master on changes;
  • can host package documentation (Sphinx);
  • can trigger test run on package upload and display test run results if connected to CI server like Jenkins;
  • has a plugin API for extending on server and CLI client side (based on pluggy library; the same one as used for extending tox or pytest if you're familiar with them); you can customize a lot of stuff by writing your own plugins, from authentication to storage backends. There are also several in-house plugins available on the Github page.

The most powerful feature IMO are the indexes. An index defines a set of packages that can be installed from the index URL. For example, imagine a single devpi instance with two indexes configured: index foo offers package A and index bar offers B. Now you have two repository URLs:

$ pip install A --index-url=https://my.pypi.org/foo

will succeed, but

$ pip install A --index-url=https://my.PyPI.org/bar

will fail. Indexes can inherit each other in the sense of extending own package base, so if bar inherits foo, you will be able to install both A and B from bar index.

This enables us to easily configure a package restriction policy: say, we have two main groups of users (devs and QA), each group having their own set of packages required, also we develop packages offered to customers and tools for internal use. No problem grouping them with indexes:

root/pypi
├── company/base    <- contains common packages like pip or setuptools
│   └── company/internal    <- in-house tools
│       ├── company/dev    <- packages necessary for development
│       │   ├── developer/sandbox    <- private index for single developer
│       │   └── developer2/sandbox
│       └── company/qa    <- packages for QA (test automation etc)
└── customer/release    <- customer packages

Now for example, the dev sets up the index URL https://my.pypi.org/developer/sandbox once and has access to all the new packages uploaded to e.g. company/base, while the customer sets up the index URL https://my.pypi.org/customer/release, not being able to access any packages from company/internal.

The root/pypi is a special meta index: it is always present; if an index inherits it, all requests for installing packages that are not contained in the index are proxied to pypi.org. To turn off the pypi.org mirroring, simply don't inherit from root/pypi.

The upload restriction policy is also easy to set up on per-index basis: all devs can upload to their own private sandboxes and company/dev; all QAs can upload to company/qa; only admin can upload to company/base, uploads to company/internal and the customer indices are made from CI server on successful nightly builds.

Refer to devpi docs for the whole setup and configuration process; the docs are pretty extensive and cover most of the questions that will arise.