Online compilers and repls – not one big security hole

compilerSecurity

There are plenty of compilers and REPL services on the web. For example: Fay ide.

I find that implementing some similar technology would be very interesting. But it seems like a major security hole. Am I wrong in thinking that exposing a compiler would be risky? An interpreter seems even worse.

update: I can be a bit more specific. The kind service I'm thinking about hosting myself (a thought experiment at this stage) would be kind of close to the case of the Fay ide above. You take input from the user, process it and return it, potentially as runable javascript to the same or other users. Let's look at that example in particular.

The most obvious security concern that I see for this case, as I don't actually evaluate the code server side, would be for the users, as they would run code made by an 'unknown' author. The fact that they are allowed to read the code before running it should be a significant safeguard in this regard.

Although I also wonder if and when translation of code constitute a threat to the server? What kind of exploit vectors could be thought of in this context?

Summary:

  1. Beyond what you can expect from any web server set up, and beyond the most obvious concerns (or perhaps some of those obvious concerns as well), how could one user compromise another?
    -. How can the server application be hardened to ensure not being compromised?
  2. What important questions have I neglected? 😉
  3. I would like to try some experiments, and I have nearlyfreespeech in mind as a host. I could imagine that such use might be prohibited by their terms of use? Or perhaps I can trust them to be sure enough that they have sandboxed my account well enough. What should be my "minimum" safeguards, for trying stuff out and what safeguards would be possible with such limited hosting environment?

update2: I believe that compiling to javascript for client side evaluation or evaluating limited edsls would cover for most/all use cases I have in mind. I take it that this means that I really don't put the server side to a more prominent risk than usual. And I don't see how the user is being put into more risk than compared to going to a random link on jsFiddle.

Best Answer

If all you're doing is compiling code (taking the user's code and transforming it without actually running it), it's no different than securing any other online program. You have a program (a compiler) that takes potentially untrusted user input (the user's code) and does something with it, and you need to make sure that it doesn't do anything it shouldn't even when given malicious input. For a compiler, many of the standard web security risks (SQL injection, CSRF) don't really apply, but you would need to check for risks like these:

  • For language features like C's #include, make sure you don't permit including files outside of the standard library files.
  • Some languages can take arbitrarily large CPU or memory to compile (thanks to features like C++'s Turing-complete templates), so you need to guard against denial of service. However, operating systems and web servers have long had features to terminate processes that take too long or grow too large.

If you're interpreting and running the user's code, then security requires running the user code in a sandbox of some sort. For example, if you disable all library calls that involve file or network I/O and similar operating system calls, and if you have the aforementioned limits on CPU and memory, then in theory, the user code can't do anything bad. In practice, this can be very tricky to get right. For example, Python lets you easily execute code with restricted globals (so you can restrict the os module), but builtin language features and introspection still provide enough functionality to break out of the sandbox. (See here.)

Another approach to securing an online interpreter or REPL is to execute everything client-side (using JavaScript and, possibly, tools like Emscripten. If the user code executes on the user's machine, then malicious user code isn't really your problem.

Update: As @JimmyHoffa points out, another approach is to use the OS or VM software to provide a more complete sandbox than the programming language is designed to do itself. For examples of this, see Chromium's sandbox for Windows or various Linux options.

Related Topic