OK, let me put this bluntly: if you're putting user data, or anything derived from user data into a cookie for this purpose, you're doing something wrong.
There. I said it. Now we can move on to the actual answer.
What's wrong with hashing user data, you ask? Well, it comes down to exposure surface and security through obscurity.
Imagine for a second that you're an attacker. You see a cryptographic cookie set for the remember-me on your session. It's 32 characters wide. Gee. That may be an MD5...
Let's also imagine for a second that they know the algorithm that you used. For example:
md5(salt+username+ip+salt)
Now, all an attacker needs to do is brute force the "salt" (which isn't really a salt, but more on that later), and he can now generate all the fake tokens he wants with any username for his IP address! But brute-forcing a salt is hard, right? Absolutely. But modern day GPUs are exceedingly good at it. And unless you use sufficient randomness in it (make it large enough), it's going to fall quickly, and with it the keys to your castle.
In short, the only thing protecting you is the salt, which isn't really protecting you as much as you think.
But Wait!
All of that was predicated that the attacker knows the algorithm! If it's secret and confusing, then you're safe, right? WRONG. That line of thinking has a name: Security Through Obscurity, which should NEVER be relied upon.
The Better Way
The better way is to never let a user's information leave the server, except for the id.
When the user logs in, generate a large (128 to 256 bit) random token. Add that to a database table which maps the token to the userid, and then send it to the client in the cookie.
What if the attacker guesses the random token of another user?
Well, let's do some math here. We're generating a 128 bit random token. That means that there are:
possibilities = 2^128
possibilities = 3.4 * 10^38
Now, to show how absurdly large that number is, let's imagine every server on the internet (let's say 50,000,000 today) trying to brute-force that number at a rate of 1,000,000,000 per second each. In reality your servers would melt under such load, but let's play this out.
guesses_per_second = servers * guesses
guesses_per_second = 50,000,000 * 1,000,000,000
guesses_per_second = 50,000,000,000,000,000
So 50 quadrillion guesses per second. That's fast! Right?
time_to_guess = possibilities / guesses_per_second
time_to_guess = 3.4e38 / 50,000,000,000,000,000
time_to_guess = 6,800,000,000,000,000,000,000
So 6.8 sextillion seconds...
Let's try to bring that down to more friendly numbers.
215,626,585,489,599 years
Or even better:
47917 times the age of the universe
Yes, that's 47917 times the age of the universe...
Basically, it's not going to be cracked.
So to sum up:
The better approach that I recommend is to store the cookie with three parts.
function onLogin($user) {
$token = GenerateRandomToken(); // generate a token, should be 128 - 256 bit
storeTokenForUser($user, $token);
$cookie = $user . ':' . $token;
$mac = hash_hmac('sha256', $cookie, SECRET_KEY);
$cookie .= ':' . $mac;
setcookie('rememberme', $cookie);
}
Then, to validate:
function rememberMe() {
$cookie = isset($_COOKIE['rememberme']) ? $_COOKIE['rememberme'] : '';
if ($cookie) {
list ($user, $token, $mac) = explode(':', $cookie);
if (!hash_equals(hash_hmac('sha256', $user . ':' . $token, SECRET_KEY), $mac)) {
return false;
}
$usertoken = fetchTokenByUserName($user);
if (hash_equals($usertoken, $token)) {
logUserIn($user);
}
}
}
Note: Do not use the token or combination of user and token to lookup a record in your database. Always be sure to fetch a record based on the user and use a timing-safe comparison function to compare the fetched token afterwards. More about timing attacks.
Now, it's very important that the SECRET_KEY
be a cryptographic secret (generated by something like /dev/urandom
and/or derived from a high-entropy input). Also, GenerateRandomToken()
needs to be a strong random source (mt_rand()
is not nearly strong enough. Use a library, such as RandomLib or random_compat, or mcrypt_create_iv()
with DEV_URANDOM
)...
The hash_equals()
is to prevent timing attacks.
If you use a PHP version below PHP 5.6 the function hash_equals()
is not supported. In this case you can replace hash_equals()
with the timingSafeCompare function:
/**
* A timing safe equals comparison
*
* To prevent leaking length information, it is important
* that user input is always used as the second parameter.
*
* @param string $safe The internal (safe) value to be checked
* @param string $user The user submitted (unsafe) value
*
* @return boolean True if the two strings are identical.
*/
function timingSafeCompare($safe, $user) {
if (function_exists('hash_equals')) {
return hash_equals($safe, $user); // PHP 5.6
}
// Prevent issues if string length is 0
$safe .= chr(0);
$user .= chr(0);
// mbstring.func_overload can make strlen() return invalid numbers
// when operating on raw binary strings; force an 8bit charset here:
if (function_exists('mb_strlen')) {
$safeLen = mb_strlen($safe, '8bit');
$userLen = mb_strlen($user, '8bit');
} else {
$safeLen = strlen($safe);
$userLen = strlen($user);
}
// Set the result to the difference between the lengths
$result = $safeLen - $userLen;
// Note that we ALWAYS iterate over the user-supplied length
// This is to prevent leaking length information
for ($i = 0; $i < $userLen; $i++) {
// Using % here is a trick to prevent notices
// It's safe, since if the lengths are different
// $result is already non-0
$result |= (ord($safe[$i % $safeLen]) ^ ord($user[$i]));
}
// They are only identical strings if $result is exactly 0...
return $result === 0;
}
OpenID is not a 'successor' or 'substitute' for CAS, they're different, in intent and in implementation.
CAS centralizes authentication. Use it if you want all your (probably internal) applications to ask users to login to a single server (all applications are configured to point to a single CAS server).
OpenID decentralizes authentication. Use it if you want your application to accept users login to whatever authentication service they want (the user provides the OpenID server address - in fact, the 'username' is the server's URL).
None of the above handle authorization (without extensions and/or customization).
OAuth handles authorization, but it is not a substitute for the traditional 'USER_ROLES table' (user access). It handles authorization for third-parties.
For example, you want your application to integrate with Twitter: a user could allow it to tweet automatically when they update their data or post new content. You want to access some third-party service or resource on behalf of a user, without getting his password (which is obviously unsecure for the user). The application asks Twitter for access, the user authorizes it (through Twitter), and then the app may have access.
So, OAuth is not about Single Sign-On (nor a substitute for the CAS protocol). It is not about you controlling what the user can access. It is about letting the user to control how their resources may be accessed by third-parties. Two very different use-cases.
To the context you described, CAS is probably the right choice.
[updated]
That said, you can implement SSO with OAuth, if you consider the identity of the user as a secured resource. This is what 'Sign up with GitHub' and the likes do, basically. Probably not the original intent of the protocol, but it can be done. If you control the OAuth server, and restrict the apps to only authenticate with it, that's SSO.
No standard way to force logout, though (CAS has this feature).
Best Answer
While implementing a microservice architecture at my previous job we decided the best approach was in alignment with #1, Add identity service and authorize service access through it. In our case this was done with tokens. If a request came with an authorization token then we could verify that token with the identity service if it was the first call in the user's session with the service. Once the token had been validated then it was saved in the session so subsequent calls in the user's session did not have to make the additional call. You can also create a scheduled job if tokens need to be refreshed in that session.
In this situation we were authenticating with an OAuth 2.0 endpoint and the token was added to the HTTP header for calls to our domain. All of the services were routed from that domain so we could get the token from the HTTP header. Since we were all part of the same application ecosystem, the initial OAuth 2.0 authorization would list the application services that the user would be giving permission to for their account.
An addition to this approach was that the identity service would provide the proxy client library which would be added to the HTTP request filter chain and handle the authorization process to the service. The service would be configured to consume the proxy client library from the identity service. Since we were using Dropwizard this proxy would become a Dropwizard Module bootstrapping the filter into the running service process. This allowed for updates to the identity service that also had a complimentary client side update to be easily consumed by dependent services as long as the interface did not change significantly.
Our deployment architecture was spread across AWS Virtual Private Cloud (VPC) and our own company's data centers. The OAuth 2.0 authentication service was located in the company's data center while all of our application services were deployed to AWS VPC.
I hope the approach we took is helpful to your decision. Let me know if you have any other questions.