Untrusted Code Execution – Best Practices for Execution of Untrusted Code

pythonSecurityweb services

I have a project where I need to allow users to run arbitrary, untrusted python code (a bit like this) against my server. I'm fairly new to python and I'd like to avoid making any mistakes that introduce security holes or other vulnerabilities into the system. Are there any best-practices available, recommended reading, or other pointers you can give me make my service usable but not abusable?

Here's what I've considered so far:

  • Remove __builtins__ from the exec context to prohibit use of potentially dangerous packages like os. Users will only be able to use packages I provide to them.
  • Use threads to enforce a reasonable timeout.
  • I'd like to limit the total amount of memory that can be allocated within the exec context, but I'm not sure if it's even possible.

There are some alternatives to a straight exec, but I'm not sure which of these would be helpful here:

  • Using an ast.NodeVisitor to catch any attempt to access unsafe objects. But what objects should I prohibit?
  • Searching for any double-underscores in the input. (less graceful than the above option).
  • Using PyPy or something similar to sandbox the code.

NOTE: I'm aware that there is at least one JavaScript-based interpreter. That will not work in my scenario.

Best Answer

Python sandboxing is hard. Python is inherently introspectable, at multiple levels.

This also means that you can find the factory methods for specific types from those types themselves, and construct new low-level objects, which will be run directly by the interpreter without limitation.

Here are some examples of finding creative ways to break out of Python sandboxes:

The basic idea is always to find a way to create base Python types; functions and classes and break out of the shell by getting the Python interpreter to execute arbitrary (unchecked!) bytecode.

The same and more applies to the exec statement (exec() function in Python 3).

So, you want to:

  • Strictly control the byte compilation of the Python code, or at least post-process the bytecode to remove any access to names starting with underscores.

    This requires intimate knowledge of how the Python interpreter works and how Python bytecode is structured. Code objects are nested; a module's bytecode only covers the top level of statements, each function and class consists of their own bytecode sequence plus metadata, containing other bytecode objects for nested functions and classes, for example.

  • You need to whitelist modules that can be used. Carefully.

    A python module contains references to other modules. If you import os, there is a local name os in your module namespace that refers to the os module. This can lead a determined attacker to modules that can help them break out of the sandbox. The pickle module, for example, lets you load arbitrary code objects for example, so if any path through whitelisted modules leads to the pickle module, you have a problem still.

  • You need to strictly limit the time quotas. Even the most neutered code can still attempt to run forever, tying up your resources.

Take a look at RestrictedPython, which attempts to give you the strict bytecode control. RestrictedPython transforms Python code into something that lets you control what names, modules and objects are permissible in Python 2.3 through to 2.7.

If RestrictedPython is secure enough for your purposes does depend on the policies you implement. Not allowing access to names starting with an underscore and strictly whitelisting the modules would be a start.

In my opinion, the only truly robust option is to use a separate Virtual Machine, one with no network access to the outside world which you destroy after each run. Each new script is given a fresh VM instead. That way even if the code manages to break out of your Python sandbox (which is not unlikely) all the attacker gets access to is short-lived and without value.

Related Topic