Don't write your own encryption program. You will do something wrong.
Media persistence like you talk about is a real problem. There are tons of old records stored on reel-to-reel tapes and not much equipment left to read it. You as the data owner will have to make sure you're moving to newer technologies as appropriate.
That said, 7zip is open-source. You can grab the source, build it yourself, and save that compiled binary. If 7zip shuts down in 5 years, you still have your copy of the binary -- the same one you used to do the encryption. Use it to do the decryption.
If you're going to be storing data for long periods, I'd also suggest including some kind of PAR2 recovery data alongside the encrypted container, to repair the container against literal bit rot.
While I don't know of any books/papers that discuss this exact problem,
it seems to me that
any solution to "the synchronization problem",
paired with any solution to "the avoid-re-encrypting-file-with-new-key problem",
should solve your original problem.
Each of those sub-problems have several solutions.
The synchronization problem
You have one "common file" (in this case, a symmetric key)
that, ideally, you want to be the same across all devices.
However, for one reason or another,
the data is somehow different from one device to the next --
split-brain syndrome --
and you want all the devices connected to the network to
somehow reach a consensus as to whether to use version A from now on,
or use version B from now on, or perhaps some entirely new version C from now on.
There are three popular approaches:
- Restructure the application to give people the functionality they want, without ever having such a "master key". In this case, the standard approach is to use public-key systems that let every device generate its own unique private key, then generate the public key from the private key, then (somehow) distribute the public keys.
- Use some sort of quorum protocol to come to consensus.
- Somehow time-tag each version, and when any node discovers that there are two versions, it picks the newest version. (I suppose it could pick the oldest version, but that makes it difficult to upgrade).
One of many possible solutions goes like this:
- If the common file does not already exist locally, and I can't seem to connect to any other devices and download the one they are using, go ahead and create a new version of the common file.
- Later, when connected to the network and we see other devices, somehow (?) download the version of the common file of each of those other connected devices. If there is at least a quorum of other devices (say, 2 other devices), then start using the version of the common file used by most connected devices (the plurality). If there is a tie among the top N versions of this file, then pick one of those top N versions at random and start using it.
In particular, if every device has a different version of this file,
then the "birthday problem" practically guarantees that, after enough iterations of this algorithm, eventually 2 devices will pick the same version of the file,
and eventually all the online devices will converge on the same version of the file.
The avoid-re-encrypting-file-with-new-key problem
All problems in computer science can be solved by another level of
indirection. But that usually will create another problem. --
attributed to David Wheeler in the book Beautiful Code (2007)
As I understand it,
- You have (encrypted) data that is synchronized between multiple devices
- You want to allow a person to change the passphrase on a device to some other passphrase of his own choosing, but you want this to be more-or-less "instant" rather than taking many minutes to decrypt and re-encrypt all the data.
- Each device has its own passphrase that can be used to access the data on the device
- You don't want unauthorized people to be able to decrypt the data (and "has the passphrase for this device" is an adequate proxy for "is authorized to decrypt the data on this device").
- You want to allow a person to create new encrypted data files before the device is ever connected to a network -- and therefore before the devices knows any shared encryption keys -- and later when the device is connected to the network, those files are synchronized to other devices where other people can decrypt and read those files.
The standard way of doing that is to store the data in OpenPGP format (as standardized in RFC 4880).
a b c d e
You already have one layer of indirection -- a person types a passphrase, which is used to decrypt the device-specific password.
The OpenPGP process uses a second layer of indirection:
Every file is encrypted with its own unique symmetric key.
It works something like this:
Every time new data is created or edited, a completely new symmetric key is generated,
the new key itself is encrypted with the user's public key and that encrypted key is stored in the header of the encrypted file. The data is encrypted with that new symmetric key and stored afterward in that encrypted file.
(This can all be done before the device ever connects to the network).
Later that encrypted file is synchronized unmodified over the network.
(Except the sender somehow obtains the receiver's device-specific key,
encrypts the file-specific symmetric key with the receiver's key,
and then adds that encrypted key to the file header).
To decrypt that file and read the data,
- A person types in the device-specific passphrase
- The device uses that passphrase to decode a file containing the device-specific key. (This is exactly what you are doing already).
- The device pulls the encrypted file-specific key out of the header of the file, and uses the device-specific key to decrypt the file-specific key. (This is a second layer of indirection).
- Then the device uses the file-specific key to decrypt the data in the file.
To make the system easier to change/migrate,
Use an encrypted file format (such as OpenPGP) that specifies exactly which encryption algorithm was used for this particular file. That allows future software to detect which encryption algorithm was used to create a particular file. Then the device can decrypt today's shiny new files using today's shiny new preferred algorithm. The device can also decrypt dusty old files with yesterday's dusty old algorithms -- and optionally re-encrypt using today's shiny new preferred algorithm.
Use an encrypted file format (such as OpenPGP) that allows you to store the particular file-specific symmetric key in the header several times, each time encrypted with a different public key or device-specific key.
When a user changes the passphrase, only the device-specific key gets re-encrypted, just like what you are doing already.
If for any reason the device-specific key needs to change,
then the device must re-encrypt the file-specific key in the header of each and every encrypted file it holds. That's probably faster than decrypting and re-encrypting the entire file.
Have you considered using some off-the-shelf implementation of OpenPGP, such as "Pretty Good Privacy" or "GNU Privacy Guard"?
Best Answer
Encryption can always be reversed. The point of encryption is to take a message and encode it with a secret key so that only another person who has the key can reverse the encryption and read the message.
What you're looking at here is hashing, which is not the same as encryption, though cryptographic techniques are often used in implementing hashes. The idea of a hash is that it uses complicated mathematical techniques to build a new value that maps to an old value, which is repeatable. There's no key, and it's not meant to be reversed. A cryptographically strong hash is created with the mathematical property that, if you have value
A
whose hash is valueB
, it's very, very difficult to intentionally create another valueC
that also hashes toB
.Hashes don't need to be reversible, because they're used for authentication. If you give me a username and a password, you really don't want me storing that password in my database, because if someone hacks in and gains access to my database, they could get ahold of your password! So instead, I'd store the hash of your password in the database. Then when you log in, I check to see if there's a username that matches yours, with a password entry that matches the hash of the password you sent, and if so you're authenticated, because it's very difficult to create a hash collision (two values that hash to the same value) with a good hash, so I'm almost perfectly certain that the password you used is the right one.
The other property of a strong cryptographic hash is that it's very difficult to reverse. You know that the value
0WrtCkg6IdaV/l4hDaYq3seMIWMbW+X/g36fvt8uYkE=
is the hash for "dominic" because you just worked it out, but if you didn't know that, and didn't know where to start looking, and all you had was0WrtCkg6IdaV/l4hDaYq3seMIWMbW+X/g36fvt8uYkE=
, it could literally take you billions of years to figure out that the original was "dominic", if the hash is a good one. Again, this is useful to prevent collateral damage in case a password list gets stolen.