Authentication versus Authorization

authenticationauthorizationSecurity

What's the difference in context of web applications? I see the abbreviation "auth" a lot. Does it stand for auth-entication or auth-orization? Or is it both?

Best Answer

Authentication is the process of ascertaining that somebody really is who they claim to be.

Authorization refers to rules that determine who is allowed to do what. E.g. Adam may be authorized to create and delete databases, while Usama is only authorised to read.

The two concepts are completely orthogonal and independent, but both are central to security design, and the failure to get either one correct opens up the avenue to compromise.

In terms of web apps, very crudely speaking, authentication is when you check login credentials to see if you recognize a user as logged in, and authorization is when you look up in your access control whether you allow the user to view, edit, delete or create content.

The Attack

So, let's start off by showing the attack...

$pdo->query('SET NAMES gbk');
$var = "\xbf\x27 OR 1=1 /*";
$query = 'SELECT * FROM test WHERE name = ? LIMIT 1';
$stmt = $pdo->prepare($query);
$stmt->execute(array($var));

In certain circumstances, that will return more than 1 row. Let's dissect what's going on here:

Selecting a Character Set
```
$pdo->query('SET NAMES gbk');
```
For this attack to work, we need the encoding that the server's expecting on the connection both to encode ' as in ASCII i.e. 0x27 and to have some character whose final byte is an ASCII \ i.e. 0x5c. As it turns out, there are 5 such encodings supported in MySQL 5.6 by default: big5, cp932, gb2312, gbk and sjis. We'll select gbk here.

Now, it's very important to note the use of SET NAMES here. This sets the character set ON THE SERVER. There is another way of doing it, but we'll get there soon enough.
The Payload

The payload we're going to use for this injection starts with the byte sequence 0xbf27. In gbk, that's an invalid multibyte character; in latin1, it's the string ¿'. Note that in latin1 and gbk, 0x27 on its own is a literal ' character.

We have chosen this payload because, if we called addslashes() on it, we'd insert an ASCII \ i.e. 0x5c, before the ' character. So we'd wind up with 0xbf5c27, which in gbk is a two character sequence: 0xbf5c followed by 0x27. Or in other words, a valid character followed by an unescaped '. But we're not using addslashes(). So on to the next step...
$stmt->execute()

The important thing to realize here is that PDO by default does NOT do true prepared statements. It emulates them (for MySQL). Therefore, PDO internally builds the query string, calling mysql_real_escape_string() (the MySQL C API function) on each bound string value.

The C API call to mysql_real_escape_string() differs from addslashes() in that it knows the connection character set. So it can perform the escaping properly for the character set that the server is expecting. However, up to this point, the client thinks that we're still using latin1 for the connection, because we never told it otherwise. We did tell the server we're using gbk, but the client still thinks it's latin1.

Therefore the call to mysql_real_escape_string() inserts the backslash, and we have a free hanging ' character in our "escaped" content! In fact, if we were to look at $var in the gbk character set, we'd see:
```
縗' OR 1=1 /*
```
Which is exactly what the attack requires.
The Query

This part is just a formality, but here's the rendered query:
```
SELECT * FROM test WHERE name = '縗' OR 1=1 /*' LIMIT 1
```

Congratulations, you just successfully attacked a program using PDO Prepared Statements...

The Simple Fix

Now, it's worth noting that you can prevent this by disabling emulated prepared statements:

$pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);

This will usually result in a true prepared statement (i.e. the data being sent over in a separate packet from the query). However, be aware that PDO will silently fallback to emulating statements that MySQL can't prepare natively: those that it can are listed in the manual, but beware to select the appropriate server version).

The Correct Fix

The problem here is that we didn't call the C API's mysql_set_charset() instead of SET NAMES. If we did, we'd be fine provided we are using a MySQL release since 2006.

If you're using an earlier MySQL release, then a bug in mysql_real_escape_string() meant that invalid multibyte characters such as those in our payload were treated as single bytes for escaping purposes even if the client had been correctly informed of the connection encoding and so this attack would still succeed. The bug was fixed in MySQL 4.1.20, 5.0.22 and 5.1.11.

But the worst part is that PDO didn't expose the C API for mysql_set_charset() until 5.3.6, so in prior versions it cannot prevent this attack for every possible command! It's now exposed as a DSN parameter, which should be used instead of SET NAMES...

The Saving Grace

As we said at the outset, for this attack to work the database connection must be encoded using a vulnerable character set. utf8mb4 is not vulnerable and yet can support every Unicode character: so you could elect to use that instead—but it has only been available since MySQL 5.5.3. An alternative is utf8, which is also not vulnerable and can support the whole of the Unicode Basic Multilingual Plane.

Alternatively, you can enable the NO_BACKSLASH_ESCAPES SQL mode, which (amongst other things) alters the operation of mysql_real_escape_string(). With this mode enabled, 0x27 will be replaced with 0x2727 rather than 0x5c27 and thus the escaping process cannot create valid characters in any of the vulnerable encodings where they did not exist previously (i.e. 0xbf27 is still 0xbf27 etc.)—so the server will still reject the string as invalid. However, see @eggyal's answer for a different vulnerability that can arise from using this SQL mode (albeit not with PDO).

Safe Examples

The following examples are safe:

mysql_query('SET NAMES utf8');
$var = mysql_real_escape_string("\xbf\x27 OR 1=1 /*");
mysql_query("SELECT * FROM test WHERE name = '$var' LIMIT 1");

Because the server's expecting utf8...

mysql_set_charset('gbk');
$var = mysql_real_escape_string("\xbf\x27 OR 1=1 /*");
mysql_query("SELECT * FROM test WHERE name = '$var' LIMIT 1");

Because we've properly set the character set so the client and the server match.

$pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$pdo->query('SET NAMES gbk');
$stmt = $pdo->prepare('SELECT * FROM test WHERE name = ? LIMIT 1');
$stmt->execute(array("\xbf\x27 OR 1=1 /*"));

Because we've turned off emulated prepared statements.

$pdo = new PDO('mysql:host=localhost;dbname=testdb;charset=gbk', $user, $password);
$stmt = $pdo->prepare('SELECT * FROM test WHERE name = ? LIMIT 1');
$stmt->execute(array("\xbf\x27 OR 1=1 /*"));

Because we've set the character set properly.

$mysqli->query('SET NAMES gbk');
$stmt = $mysqli->prepare('SELECT * FROM test WHERE name = ? LIMIT 1');
$param = "\xbf\x27 OR 1=1 /*";
$stmt->bind_param('s', $param);
$stmt->execute();

Because MySQLi does true prepared statements all the time.

Wrapping Up

If you:

Use Modern Versions of MySQL (late 5.1, all 5.5, 5.6, etc) AND PDO's DSN charset parameter (in PHP ≥ 5.3.6)

Don't use a vulnerable character set for connection encoding (you only use utf8 / latin1 / ascii / etc)

Enable NO_BACKSLASH_ESCAPES SQL mode

You're 100% safe.

Otherwise, you're vulnerable even though you're using PDO Prepared Statements...

Addendum

I've been slowly working on a patch to change the default to not emulate prepares for a future version of PHP. The problem that I'm running into is that a LOT of tests break when I do that. One problem is that emulated prepares will only throw syntax errors on execute, but true prepares will throw errors on prepare. So that can cause issues (and is part of the reason tests are borking).

RESTful Authentication

How to handle authentication in a RESTful Client-Server architecture is a matter of debate.

Commonly, it can be achieved, in the SOA over HTTP world via:

HTTP basic auth over HTTPS;
Cookies and session management;
Token in HTTP headers (e.g. OAuth 2.0 + JWT);
Query Authentication with additional signature parameters.

You'll have to adapt, or even better mix those techniques, to match your software architecture at best.

Each authentication scheme has its own PROs and CONs, depending on the purpose of your security policy and software architecture.

HTTP basic auth over HTTPS

This first solution, based on the standard HTTPS protocol, is used by most web services.

GET /spec.html HTTP/1.1
Host: www.example.org
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

It's easy to implement, available by default on all browsers, but has some known drawbacks, like the awful authentication window displayed on the Browser, which will persist (there is no LogOut-like feature here), some server-side additional CPU consumption, and the fact that the user-name and password are transmitted (over HTTPS) into the Server (it should be more secure to let the password stay only on the client side, during keyboard entry, and be stored as secure hash on the Server).

We may use Digest Authentication, but it requires also HTTPS, since it is vulnerable to MiM or Replay attacks, and is specific to HTTP.

Session via Cookies

To be honest, a session managed on the Server is not truly Stateless.

One possibility could be to maintain all data within the cookie content. And, by design, the cookie is handled on the Server side (Client, in fact, does even not try to interpret this cookie data: it just hands it back to the server on each successive request). But this cookie data is application state data, so the client should manage it, not the server, in a pure Stateless world.

GET /spec.html HTTP/1.1
Host: www.example.org
Cookie: theme=light; sessionToken=abc123

The cookie technique itself is HTTP-linked, so it's not truly RESTful, which should be protocol-independent, IMHO. It is vulnerable to MiM or Replay attacks.

Granted via Token (OAuth2)

An alternative is to put a token within the HTTP headers so that the request is authenticated. This is what OAuth 2.0 does, for instance. See the RFC 6749:

 GET /resource/1 HTTP/1.1
 Host: example.com
 Authorization: Bearer mF_9.B5f-4.1JqM

In short, this is very similar to a cookie and suffers to the same issues: not stateless, relying on HTTP transmission details, and subject to a lot of security weaknesses - including MiM and Replay - so is to be used only over HTTPS. Typically, a JWT is used as a token.

Query Authentication

Query Authentication consists in signing each RESTful request via some additional parameters on the URI. See this reference article.

It was defined as such in this article:

All REST queries must be authenticated by signing the query parameters sorted in lower-case, alphabetical order using the private credential as the signing token. Signing should occur before URL encoding the query string.

This technique is perhaps the more compatible with a Stateless architecture, and can also be implemented with a light session management (using in-memory sessions instead of DB persistence).

For instance, here is a generic URI sample from the link above:

GET /object?apiKey=Qwerty2010

should be transmitted as such:

GET /object?timestamp=1261496500&apiKey=Qwerty2010&signature=abcdef0123456789

The string being signed is /object?apikey=Qwerty2010&timestamp=1261496500 and the signature is the SHA256 hash of that string using the private component of the API key.

Server-side data caching can be always available. For instance, in our framework, we cache the responses at the SQL level, not at the URI level. So adding this extra parameter doesn't break the cache mechanism.

See this article for some details about RESTful authentication in our client-server ORM/SOA/MVC framework, based on JSON and REST. Since we allow communication not only over HTTP/1.1, but also named pipes or GDI messages (locally), we tried to implement a truly RESTful authentication pattern, and not rely on HTTP specificity (like header or cookies).

Later Note: adding a signature in the URI can be seen as bad practice (since for instance it will appear in the http server logs) so it has to be mitigated, e.g. by a proper TTL to avoid replays. But if your http logs are compromised, you will certainly have bigger security problems.

In practice, the upcoming MAC Tokens Authentication for OAuth 2.0 may be a huge improvement in respect to the "Granted by Token" current scheme. But this is still a work in progress and is tied to HTTP transmission.

Conclusion

It's worth concluding that REST is not only HTTP-based, even if, in practice, it's also mostly implemented over HTTP. REST can use other communication layers. So a RESTful authentication is not just a synonym of HTTP authentication, whatever Google answers. It should even not use the HTTP mechanism at all but shall be abstracted from the communication layer. And if you use HTTP communication, thanks to the Let's Encrypt initiative there is no reason not to use proper HTTPS, which is required in addition to any authentication scheme.