Implementation ideas to store multiple files within a single file for faster access

encryptionfiles

My requirement is to store a large number of files within a single file.The files stored could be anything like images, videos or simple text files as well. I want some ideas to implement the same. I am thinking of implementing a file system within a file, but am not sure if its a good idea.

Adding in more details as requested : Platform to be developed on is Android. The idea initially was to store all the data using sqlite and provide encryption on it, but I think it will eventually lead to a slow down as the file size increases. The file is going to increase in size with time.

The main area of concern here is the access time. Also I want to provide encryption for this single file.
Any suggestions are welcome.

Best Answer

Note: if you stated the purpose of your file container more clearly, describing access patterns, desired platforms, and the problem you're solving in general, the answers might be better.

Your description looks awfully similar to a game resource file, a renowned file type. These files are not intended to be updated frequently (if ever), but are optimized for fast seeking and reading.

There are several known implementations: for instance, iD Software used WAD files and PAK files, but finally came to use ZIP files.

Many applications use various derivatives of IFF format, built from self-describing chunks, and, with some care, efficiently updatable.

In a chunked file, or a zip file (consider zero compression), you can encrypt each entry independently, before writing it into the file. Provided that you use a block cipher like AES256, your data does not change size, except for aligning it to block boundary. You definitely want your decryption key stored somewhere else :)

Writing a file system to provide encryption is not only possible, but has actually been done many times. For instance, Linux has encfs, and Windows has encrypted folders. TrueCrypt is an advanced virtual encrypted FS available on many platforms.

Please note that while zip files provide for a sort of native encryption, this encryption is relatively weak. Same applies to rar files, for all I know.

Related Solutions

Security – Advice on Developing a Sensitive Data Transfer/Storage/Encryption System

I hope this isn't too blunt, but the task you are undertaking here is extremely difficult, and the odds of you getting it right are slim. Security flaws are most of the time caused by mistakes in implementation, not in the underlying technologies. In order to make a system like the one you've described secure, you have to use the correct tools and the correct methodology and account for all of the edge cases or the security of the entire system will be compromised.

That's not really a helpful answer though, is it? When you are building a system like you are building the question you should be asking shouldn't be "How do I do this?" It should instead be "What is the way I can do this that relies the least on myself?" The answer to that question is to use tried and tested systems wherever possible, and to roll your own solutions only as a last resort.

To answer your first point about encryption, it doesn't make sense to worry too much about securing a key in memory of the server. If an attacker has enough access to a machine to read your keys out of memory, you are totally and completely hosed and any solutions that you have coded up aren't going to help much any way. In other words, favor securing data at rest and data that is moving over the internet, since that is where most attacks are going to occur.

As far as storing the data goes, I don't see any reason why asymmetric crypto needs to be involved here. I would use something like PBKDF2 to derive a key directly from the user's password, then encrypt the data and store the encrypted blob in a database. I would recommend a database over a flat file because managing a folder full of flat files is tedious at the best of times. Databases may not show any solid benifits in speed or security over flat files, but they come with many other features such as pooled connections and they also make backing up data much easier than flat files. Use the simplest system you can to minimize your attack surface, and use thoroughly tested open source tools whenever possible. If you can find a way to use GPG for the encryption and key derivation part of things, I would recommend it.

As far as transfer goes, I believe that you are thinking about things the wrong way. Don't do any encryption client side. Browser javascript is not suitable for cryptography, as explained in this article. So long as you make sure that you use TLS/SSL for all connections to your site, you shouldn't need to worry about transmitting data unencrypted. For an example of why it is hard to do client side encryption, do some googling about the security of MegaUpload's successor, MEGA.

Finally, I wouldn't trust any one dude you get an answer from on the internet, including myself. I would do a lot of research about this sort of thing before committing to a solution. Also, I might recommend asking this question over at the IT Security Stack Exchange.

-- EDIT --

Somehow, I totally missed the fact that there are three parts to your system, the client (browser), the server (database), and the connector that imports data from the VisualFox Database. This actually makes the whole system a lot more complex, because there are essentially three parties that need to share a secret, instead of two. What I would recommend is not to encrypt the data based on the users password, but to instead encrypt it based on some server password. I'm having a little bit of trouble thinking of a good way to describe this process, so I'll give you an example workflow instead.

Server Side

Admin starts server.
During start up, server code asks for a password.
Server uses PBKDF2 to derive a key which is stored only in memory.
Server spawns a thread that will poll the VirtualFox Pro server every X (days/hours/minutes) for updated data.
Server enters loop awaiting requests from browser clients.

Updating database

Main Server's child thread requests an update of data from the Virtual Fox Pro server.
VirtualFox Pro server dumps a report containing data for client's with modified entries.
VirtualFox Pro server opens secure connection to main server (ssh, sftp, etc) and transmits zipped data.
One by one, the main server uses the PBKDF2 derived key that is stored in memory to decrypt blobs stored in a database, update them with new data, reencrypt them, and store them back into the database. This process should all happen in-memory.

Browser client connects

Main server receives https request from client.
Main server uses some third party authentication framework to check clients credentials. This framework should use bcrypt to hash passwords and only store the hashes on the file system.
If the authentication framework positively identifies a user, the main server will decrypt the user's blob using the PBKDF2 derived key in memory and send the data to the user.
When the user's authentication cookie expires, the main server will stop using the PBKDF2 derived key to decrypt data, and will instead prompt the user to re-authenticate.

This model is more in line with how traditional websites work (which means that you can rely on third party, bug tested frameworks), but data is encrypted/decrypted in memory before touching the database. Ideally, you could use GPG or some other keystore for managing the encryption keys on the main server as well.

Encrypting stored data for multiple unique users to access information

Are there techniques that can encrypt the data in such a way that a data breach on the server won't show the data in the database in clear text, but that multiple users of certain permissions can decrypt the data they need?

Yes. You need two things:

modify your existing table structure so that all fields that you want encrypted are appropriately altered in size and type. For example a VARCHAR(50) would become VARBINARY(64) in order to employ AES algorithm, that uses a code block of 128 bits (16 bytes) and would output blocks of multiples of 16 bytes.
then you add a mapping table for each entity which you need to encrypt. For example you want to encrypt fields in tables User and Report, and want them independent; you add two tables User_Map and Report_Map. If Reports belong to Users as a one-to-many relationship, and access to User 12345 will therefore grant access to its Reports, you only need User_Map (any Report will use the secret key of its User). And so on.

The mapping table holds three columns: Accessor, Accessed, and Key. Accessor identifies whoever is accessing the data (it could be a user_id for example). Accessed identifies the entity which is being accessed (user_id or report_id). Finally key is a cryptographically strong random value associated with the Accessed resource instance at the moment of its creation, encrypted with Accessor's password.

All data in the mapped resource is encrypted using that random value, which will never be disclosed and will never be changed; changing it, while possible, would be really awkward.

So if you have five Books in your table, each of those will have a random value of its own, that exists nowhere in the clear.

The operations you need to support are:

access to a resource by Accessor: Accessor has identified and his password is in memory. When accessing resource XYZ, a search for user_id = Accessor_id AND resource_id = XYZ will yield XYZ's encrypted key. Decrypt it with the password. The key thus obtained will allow decryption/reencryption of XYZ's data.

Multiple Accessors can access the same Resource:

Accessor_id     Accessed_id     Key
1(joe)          101             ENCRYPT('random101', 'joespassword');
2(alice)        101             ENCRYPT('random101', 'alicespassword');
1(joe)          102             ENCRYPT('random102', 'joespassword');

update Accessor's password: you need to decrypt all Key values that are encrypted with the old password and reencrypt them with the new one:

UPDATE Users_Map SET CKey = ENCRYPT(DECRYPT(CKey, :OldPassword), :NewPassword) WHERE Accessor_id = :MyId;
delete Accessor's credentials: just kill all tuples with that Accessor_id (note: if all Accessors with access to a given Accessed are eliminated, the Accessed's data will become unavailable. One user (e.g. root) should have access and not be deletable).

Some operations you might want to support:

access all data accessible to Accessor in a hierarchical way (i.e. you are Accessor's supervisor): that's tricky. You need to encrypt and maintain Accessor's password, which you can only do if Supervisor is logged in. To provide for this, you can supply a asymmetric encryption scheme whereby if user A is supervised by user B, then the t-uple (B_id, A_id, (A's private key encrypted with B's public key)) is available in the hierarchic database. This way, it is always possible for a supervisor to access his people's private keys, and they can update them without needing the supervisor's intervention.
granting rights to a Resource to a user also requires using asymmetric ciphers, because while user A is in possession of the resource instance's unique key, s/he can neither send it in the clear to user B (it would permanently disclose that resource instance's plaintext for everyone), nor can s/he encrypt it with B's password since it is known only to B. The solution is encrypt the instance key with user B's public key (which can be known), while the private key remains encrypted.

Since sending the private key might be difficult, being a couple of kilobytes, you could generate a public/private key pair and encrypt the private key symmetrically using a BCrypt hash of the user's password. When the user logs in, his private key becomes accessible to him or her, and can be used to unlock all asymmetrically encrypted information, if there is any (if you don't need grants and hierarchies, there might be no need).

Performances and limitations

The performance impact is not too large, if we consider that encrypted data are used mainly upon receipt or sending data to a user (which are slow operations in themselves). But this introduces the main limitation - encrypted data cannot be easily used in the system.

For example, straight search is either impossible or very slow: techniques for encrypted searching exist, but with this scheme, looking for records containing "Hello world" would require changing something like

SELECT ...
FROM Resource
WHERE searchField LIKE :searchText

SELECT ...
FROM Resource
JOIN Record_Map ON (Resource.id = Record_Map.Resource_id)
WHERE DECRYPT(searchField, DECRYPT(Record_Map.Key, :myPassword)) LIKE :searchText
AND Record_Map.Accessor_id = :myId

Security

The database contains no plaintext keys or passwords, so capturing a database would not compromise data security. Standard caveats apply: user passwords must not be guessable and must be properly stored and checked (bcrypt is good for that). Accessing a user's password only allows accessing the data that user is granted read rights to. Possession of a currently valid password and physical access to database would allow recovering the accessible data not only in the present but in the future (provided physical access to the database is secured again).

Example: Eve steals a copy of the database. She is unable to read anything since she has no passwords. She manages to acquire Alice's password and is thus able to decrypt Bob's record, that Alice has access to. Alice changes her password, and updates Bob's record; after Dane is added to her people, Eve steals a fresh copy of the database. She can still read Bob's updated record (she has Bob's key from old database), but not Dane's (Dane's key is encrypted with Alice's new password, that Eve lacks).

To "re-secure" a record with N managers it is necessary to change its encryption key, which requires updating all N extant mapping records for that record; which requires access to all N passwords at the same time.