Security – How to Log Potentially Sensitive Data

loggingSecurity

I'm working on a web application where the server side makes calls to a foreign server's web service API. The data passed to this API could contain sensitive data (like a user's password) as part of the header and/or in the body of the request. I want to be able to log what's in the request (URI, header, body, response) for troubleshooting purposes but I don't want to expose any sensitive information. I have a single method that makes this API request which other methods call and that's where I've been putting the log statements. This of course reveals everything, including sensitive data. Is there a common way to do this?

Best Answer

I'm pretty dismayed at these other answers. Best practice is only to log what you need to log, so not the full request and response.

You must then consider data protection law. Obviously this varies by jurisdiction but generally it follows the same principles:

You must document what you are doing, why and the precautions you take to secure the data.

You will find it difficult to justify a blanket log everything policy. 'because its easy' doesn't count as a good reason.

You will probably have had to ask your customers for permission to use their data for the purpose you describe.

Although a blanket 'tick here to agree to all TnCs' with '... use your data for maintaining the system..' might seem to cover you this has been challenged in court a few times now. The reality is you wont know until you are sued.

Then you probably have some classes of data which have to be encrypted.
Then you have to have a policy around how long you keep the data and how you dispose of it.

Data protection law can be weirdly strict on some things and lax on others. For example I remember one case where in flight meal options were all encrypted because you had options for halal, kosher etc so they would potentially reveal the travellers religion, which counts as sensitive personal data.

Things that wont cut it.

Sanitising the logs. If even one thing slips though you are screwed. Not because of the data. Because of all that documentation you wrote which said you didn't store that data. And stuff will slip through.

Logging everything but to an encrypted store. It doesn't matter how secure it is because the law covers more than just keeping data safe. Any scenario where you are keeping 'everything' is going to be tripped up because you have to state the purpose of keeping it.

******** edit : suggested ways of solving bugs without logging ************

So for api calls with malformed requests I find its best to return a nice error message. It shouldn't contain any data, but it should indicate clearly and uniquely what the error was.

BAD : error - user bob cant eat pork

GOOD : error - meal code not allowed for this user

Ensure you push the error all the way back to the user if the code cant handle it. Bob knows he loves pork and will phone up to report the error. Plus because there is no data, you can log it without worry and check for patterns.

rule 2 : testing testing testing. Write those unit tests and run em all the time. When Bob phones and says he cant order pork because of a bug, write a failing test before you fix it so you know if it ever happens again.

rule 3 : health checks, if you have an api or service, pop a health check method on it. Have you monitoring system hit that health check every few minutes to check everything is ok with the service, it can still talk to the DB is still answering requests promptly etc

Related Solutions

CSRF Security – How Frequent Should the Token Updation in CSRF Security Be?

The main point of a CSRF token is that it can't have been sent from another site. So therefore it (a) can't be predicted or detected by an attacker, and (b) is not automatically attached to a request the way a cookie is.

So theoretically if a CSRF token is never disclosed to third parties, again theoretically, you don't have to expire them at all. But then you run the risk of your token getting "leaked" somehow. So your expiry period really should be short enough to combat the prospect of a token getting out and being used against your user.

There aren't really any guidelines, but a good solid techique is to auto-generate a new token on EVERY request which embeds a signed timecode, and then accept tokens up to a certain age.

A sample function might be:

concat(current_time,salt,sha256_sum(concat(salt,userid,current_time,secret_string)))

The token contains timing information and a salt, but also contains a signature which can't be forged and which is tied to the userid.

Then you can define your own expiry interval -- an hour, a day, 2 hours. Whatever. The interval in this case isn't tied to the token, so you're free to set expiry rules however you want to.

At the very least, though, CSRF tokens should expire when the login session expires or when the user logs out. There's no expectation by the user that a form that you brought up BEFORE you logged out will continue to work AFTER you log back in again.

Security – Advice on Developing a Sensitive Data Transfer/Storage/Encryption System

I hope this isn't too blunt, but the task you are undertaking here is extremely difficult, and the odds of you getting it right are slim. Security flaws are most of the time caused by mistakes in implementation, not in the underlying technologies. In order to make a system like the one you've described secure, you have to use the correct tools and the correct methodology and account for all of the edge cases or the security of the entire system will be compromised.

That's not really a helpful answer though, is it? When you are building a system like you are building the question you should be asking shouldn't be "How do I do this?" It should instead be "What is the way I can do this that relies the least on myself?" The answer to that question is to use tried and tested systems wherever possible, and to roll your own solutions only as a last resort.

To answer your first point about encryption, it doesn't make sense to worry too much about securing a key in memory of the server. If an attacker has enough access to a machine to read your keys out of memory, you are totally and completely hosed and any solutions that you have coded up aren't going to help much any way. In other words, favor securing data at rest and data that is moving over the internet, since that is where most attacks are going to occur.

As far as storing the data goes, I don't see any reason why asymmetric crypto needs to be involved here. I would use something like PBKDF2 to derive a key directly from the user's password, then encrypt the data and store the encrypted blob in a database. I would recommend a database over a flat file because managing a folder full of flat files is tedious at the best of times. Databases may not show any solid benifits in speed or security over flat files, but they come with many other features such as pooled connections and they also make backing up data much easier than flat files. Use the simplest system you can to minimize your attack surface, and use thoroughly tested open source tools whenever possible. If you can find a way to use GPG for the encryption and key derivation part of things, I would recommend it.

As far as transfer goes, I believe that you are thinking about things the wrong way. Don't do any encryption client side. Browser javascript is not suitable for cryptography, as explained in this article. So long as you make sure that you use TLS/SSL for all connections to your site, you shouldn't need to worry about transmitting data unencrypted. For an example of why it is hard to do client side encryption, do some googling about the security of MegaUpload's successor, MEGA.

Finally, I wouldn't trust any one dude you get an answer from on the internet, including myself. I would do a lot of research about this sort of thing before committing to a solution. Also, I might recommend asking this question over at the IT Security Stack Exchange.

-- EDIT --

Somehow, I totally missed the fact that there are three parts to your system, the client (browser), the server (database), and the connector that imports data from the VisualFox Database. This actually makes the whole system a lot more complex, because there are essentially three parties that need to share a secret, instead of two. What I would recommend is not to encrypt the data based on the users password, but to instead encrypt it based on some server password. I'm having a little bit of trouble thinking of a good way to describe this process, so I'll give you an example workflow instead.

Server Side

Admin starts server.
During start up, server code asks for a password.
Server uses PBKDF2 to derive a key which is stored only in memory.
Server spawns a thread that will poll the VirtualFox Pro server every X (days/hours/minutes) for updated data.
Server enters loop awaiting requests from browser clients.

Updating database

Main Server's child thread requests an update of data from the Virtual Fox Pro server.
VirtualFox Pro server dumps a report containing data for client's with modified entries.
VirtualFox Pro server opens secure connection to main server (ssh, sftp, etc) and transmits zipped data.
One by one, the main server uses the PBKDF2 derived key that is stored in memory to decrypt blobs stored in a database, update them with new data, reencrypt them, and store them back into the database. This process should all happen in-memory.

Browser client connects

Main server receives https request from client.
Main server uses some third party authentication framework to check clients credentials. This framework should use bcrypt to hash passwords and only store the hashes on the file system.
If the authentication framework positively identifies a user, the main server will decrypt the user's blob using the PBKDF2 derived key in memory and send the data to the user.
When the user's authentication cookie expires, the main server will stop using the PBKDF2 derived key to decrypt data, and will instead prompt the user to re-authenticate.

This model is more in line with how traditional websites work (which means that you can rely on third party, bug tested frameworks), but data is encrypted/decrypted in memory before touching the database. Ideally, you could use GPG or some other keystore for managing the encryption keys on the main server as well.

Best Answer

Related Solutions

CSRF Security – How Frequent Should the Token Updation in CSRF Security Be?

Security – Advice on Developing a Sensitive Data Transfer/Storage/Encryption System

Related Topic