Online compilers and repls – not one big security hole

compilerSecurity

There are plenty of compilers and REPL services on the web. For example: Fay ide.

I find that implementing some similar technology would be very interesting. But it seems like a major security hole. Am I wrong in thinking that exposing a compiler would be risky? An interpreter seems even worse.

update: I can be a bit more specific. The kind service I'm thinking about hosting myself (a thought experiment at this stage) would be kind of close to the case of the Fay ide above. You take input from the user, process it and return it, potentially as runable javascript to the same or other users. Let's look at that example in particular.

The most obvious security concern that I see for this case, as I don't actually evaluate the code server side, would be for the users, as they would run code made by an 'unknown' author. The fact that they are allowed to read the code before running it should be a significant safeguard in this regard.

Although I also wonder if and when translation of code constitute a threat to the server? What kind of exploit vectors could be thought of in this context?

Summary:

Beyond what you can expect from any web server set up, and beyond the most obvious concerns (or perhaps some of those obvious concerns as well), how could one user compromise another?
-. How can the server application be hardened to ensure not being compromised?
What important questions have I neglected? 😉
I would like to try some experiments, and I have nearlyfreespeech in mind as a host. I could imagine that such use might be prohibited by their terms of use? Or perhaps I can trust them to be sure enough that they have sandboxed my account well enough. What should be my "minimum" safeguards, for trying stuff out and what safeguards would be possible with such limited hosting environment?

update2: I believe that compiling to javascript for client side evaluation or evaluating limited edsls would cover for most/all use cases I have in mind. I take it that this means that I really don't put the server side to a more prominent risk than usual. And I don't see how the user is being put into more risk than compared to going to a random link on jsFiddle.

Best Answer

If all you're doing is compiling code (taking the user's code and transforming it without actually running it), it's no different than securing any other online program. You have a program (a compiler) that takes potentially untrusted user input (the user's code) and does something with it, and you need to make sure that it doesn't do anything it shouldn't even when given malicious input. For a compiler, many of the standard web security risks (SQL injection, CSRF) don't really apply, but you would need to check for risks like these:

For language features like C's #include, make sure you don't permit including files outside of the standard library files.
Some languages can take arbitrarily large CPU or memory to compile (thanks to features like C++'s Turing-complete templates), so you need to guard against denial of service. However, operating systems and web servers have long had features to terminate processes that take too long or grow too large.

If you're interpreting and running the user's code, then security requires running the user code in a sandbox of some sort. For example, if you disable all library calls that involve file or network I/O and similar operating system calls, and if you have the aforementioned limits on CPU and memory, then in theory, the user code can't do anything bad. In practice, this can be very tricky to get right. For example, Python lets you easily execute code with restricted globals (so you can restrict the os module), but builtin language features and introspection still provide enough functionality to break out of the sandbox. (See here.)

Another approach to securing an online interpreter or REPL is to execute everything client-side (using JavaScript and, possibly, tools like Emscripten. If the user code executes on the user's machine, then malicious user code isn't really your problem.

Update: As @JimmyHoffa points out, another approach is to use the OS or VM software to provide a more complete sandbox than the programming language is designed to do itself. For examples of this, see Chromium's sandbox for Windows or various Linux options.

Not without SSL

This is not secure if the password is sent over the network in plain text. Hashing the password on the server side is also not secure if the password is sent over the network in plain text.

Since the HTML <input type="password"/> tag sends its contents in plain text, this will be a problem no matter how you store the password on the server, unless your website uses SSL to transmit the password.

(HTTP authentication, which pops up a dialog box in the browser asking for a password, may or may not be clear text, depending on what authentication mechanisms the server and browser have in common. So that could be a way to avoid this without using SSL.)

Not if the site administrators are suspect

Now, supposing you're using HTTPS to do the web site, this could be secure if you trust your site administrators (who can read plain text passwords), and other people who have access to the machine to behave properly. Now, it may be obvious that they can do anything they want with your website (since they administer it), but if they can read the password, the may also be able to use the stolen login/password pairs on other people's sites.

A way that keeps passwords safe from the administrator

One secure way to store and check passwords is as follows:

def change_password user, new_password
  salt = random(65536).to_s(16) #will be 4 characters long
  password_hash = salt + hash(salt + new_password)
  store(user,password_hash)
end

def does_password_match? user, entered_password
  correct_password_hash = retrieve(user)
  salt = correct_password_hash[0...4]
  entered_password_hash = salt + hash(salt + entered_password)
  return correct_password_hash == entered_password_hash
end

For the hash function, try to use something strong, and something that doesn't have good rainbow tables in the wild yet. You can change the length of the salt if necessary work around rainbow tables.

Depending on the environment you're in, the variability in your network latency, and whether user names are meant to be publically known, you may want to have another code path compute hash('0000'+entered_password) if the user doesn't exist, in order to prevent attackers from determining which usernames are valid based on the time it takes determine that the password is incorrect.

Web Security – Best Practices for Web Application Authentication and Security

Some high level tips:

Store only the data you need
Always encrypt sensitive data (SSN, password, credit card #, etc.) when you store it
Always encrypt traffic using SSL when transmitting/receiving sensitive data
If in doubt about the sensitivity of information, encrypt it
Don't trust user input (someone will try to enter something bad)
Don't trust your data (someone can change it in the database - injecting malicious script for example)
Don't roll your own encryption
Secure the servers hosting the applications / databases
Increase the burden on end users for the sake of security (password restrictions, never expose passwords, don't send URLs in email, reduce session time, etc.)

My suggestion to you would be to get a book on securing Web applications. There is just too much information to convey in a single answer / blog / article. The topic of encryption alone is substantial.

Best Answer

Related Solutions

Security – Is ‘if password == XXXXXXX’ Enough for Minimum Security?

Not without SSL

Not if the site administrators are suspect

A way that keeps passwords safe from the administrator

Web Security – Best Practices for Web Application Authentication and Security

Related Topic