Java 256-bit AES Password-Based Encryption

aescryptographyencryptionjavapasswords

I need to implement 256 bit AES encryption, but all the examples I have found online use a "KeyGenerator" to generate a 256 bit key, but I would like to use my own passkey. How can I create my own key? I have tried padding it out to 256 bits, but then I get an error saying that the key is too long. I do have the unlimited jurisdiction patch installed, so thats not the problem 🙂

Ie. The KeyGenerator looks like this …

// Get the KeyGenerator
KeyGenerator kgen = KeyGenerator.getInstance("AES");
kgen.init(128); // 192 and 256 bits may not be available

// Generate the secret key specs.
SecretKey skey = kgen.generateKey();
byte[] raw = skey.getEncoded();

Code taken from here

EDIT

I was actually padding the password out to 256 bytes, not bits, which is too long. The following is some code I am using now that I have some more experience with this.

byte[] key = null; // TODO
byte[] input = null; // TODO
byte[] output = null;
SecretKeySpec keySpec = null;
keySpec = new SecretKeySpec(key, "AES");
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS7Padding");
cipher.init(Cipher.ENCRYPT_MODE, keySpec);
output = cipher.doFinal(input)

The "TODO" bits you need to do yourself 🙂

Best Answer

Share the password (a char[]) and salt (a byte[]—8 bytes selected by a SecureRandom makes a good salt—which doesn't need to be kept secret) with the recipient out-of-band. Then to derive a good key from this information:

/* Derive the key, given password and salt. */
SecretKeyFactory factory = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA256");
KeySpec spec = new PBEKeySpec(password, salt, 65536, 256);
SecretKey tmp = factory.generateSecret(spec);
SecretKey secret = new SecretKeySpec(tmp.getEncoded(), "AES");

The magic numbers (which could be defined as constants somewhere) 65536 and 256 are the key derivation iteration count and the key size, respectively.

The key derivation function is iterated to require significant computational effort, and that prevents attackers from quickly trying many different passwords. The iteration count can be changed depending on the computing resources available.

The key size can be reduced to 128 bits, which is still considered "strong" encryption, but it doesn't give much of a safety margin if attacks are discovered that weaken AES.

Used with a proper block-chaining mode, the same derived key can be used to encrypt many messages. In Cipher Block Chaining (CBC), a random initialization vector (IV) is generated for each message, yielding different cipher text even if the plain text is identical. CBC may not be the most secure mode available to you (see AEAD below); there are many other modes with different security properties, but they all use a similar random input. In any case, the outputs of each encryption operation are the cipher text and the initialization vector:

/* Encrypt the message. */
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
cipher.init(Cipher.ENCRYPT_MODE, secret);
AlgorithmParameters params = cipher.getParameters();
byte[] iv = params.getParameterSpec(IvParameterSpec.class).getIV();
byte[] ciphertext = cipher.doFinal("Hello, World!".getBytes(StandardCharsets.UTF_8));

Store the ciphertext and the iv. On decryption, the SecretKey is regenerated in exactly the same way, using using the password with the same salt and iteration parameters. Initialize the cipher with this key and the initialization vector stored with the message:

/* Decrypt the message, given derived key and initialization vector. */
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
cipher.init(Cipher.DECRYPT_MODE, secret, new IvParameterSpec(iv));
String plaintext = new String(cipher.doFinal(ciphertext), StandardCharsets.UTF_8);
System.out.println(plaintext);

Java 7 included API support for AEAD cipher modes, and the "SunJCE" provider included with OpenJDK and Oracle distributions implements these beginning with Java 8. One of these modes is strongly recommended in place of CBC; it will protect the integrity of the data as well as their privacy.

A java.security.InvalidKeyException with the message "Illegal key size or default parameters" means that the cryptography strength is limited; the unlimited strength jurisdiction policy files are not in the correct location. In a JDK, they should be placed under ${jdk}/jre/lib/security

Based on the problem description, it sounds like the policy files are not correctly installed. Systems can easily have multiple Java runtimes; double-check to make sure that the correct location is being used.

Please consider long and hard if you can't get around implementing your own cryptography

The ugly truth of the matter is that if you are asking this question you will probably not be able to design and implement a secure system.

Let me illustrate my point: Imagine you are building a web application and you need to store some session data. You could assign each user a session ID and store the session data on the server in a hash map mapping session ID to session data. But then you have to deal with this pesky state on the server and if at some point you need more than one server things will get messy. So instead you have the idea to store the session data in a cookie on the client side. You will encrypt it of course so the user cannot read and manipulate the data. So what mode should you use? Coming here you read the top answer (sorry for singling you out myforwik). The first one covered - ECB - is not for you, you want to encrypt more than one block, the next one - CBC - sounds good and you don't need the parallelism of CTR, you don't need random access, so no XTS and patents are a PITA, so no OCB. Using your crypto library you realize that you need some padding because you can only encrypt multiples of the block size. You choose PKCS7 because it was defined in some serious cryptography standards. After reading somewhere that CBC is provably secure if used with a random IV and a secure block cipher, you rest at ease even though you are storing your sensitive data on the client side.

Years later after your service has indeed grown to significant size, an IT security specialist contacts you in a responsible disclosure. She's telling you that she can decrypt all your cookies using a padding oracle attack, because your code produces an error page if the padding is somehow broken.

This is not a hypothetical scenario: Microsoft had this exact flaw in ASP.NET until a few years ago.

The problem is there are a lot of pitfalls regarding cryptography and it is extremely easy to build a system that looks secure for the layman but is trivial to break for a knowledgeable attacker.

What to do if you need to encrypt data

For live connections use TLS (be sure to check the hostname of the certificate and the issuer chain). If you can't use TLS, look for the highest level API your system has to offer for your task and be sure you understand the guarantees it offers and more important what it does not guarantee. For the example above a framework like Play offers client side storage facilities, it does not invalidate the stored data after some time, though, and if you changed the client side state, an attacker can restore a previous state without you noticing.

If there is no high level abstraction available use a high level crypto library. A prominent example is NaCl and a portable implementation with many language bindings is Sodium. Using such a library you do not have to care about encryption modes etc. but you have to be even more careful about the usage details than with a higher level abstraction, like never using a nonce twice. For custom protocol building (say you want something like TLS, but not over TCP or UDP) there are frameworks like Noise and associated implementations that do most of the heavy lifting for you, but their flexibility also means there is a lot of room for error, if you don't understand in depth what all the components do.

If for some reason you cannot use a high level crypto library, for example because you need to interact with existing system in a specific way, there is no way around educating yourself thoroughly. I recommend reading Cryptography Engineering by Ferguson, Kohno and Schneier. Please don't fool yourself into believing you can build a secure system without the necessary background. Cryptography is extremely subtle and it's nigh impossible to test the security of a system.

Comparison of the modes

Encryption only:

Modes that require padding: Like in the example, padding can generally be dangerous because it opens up the possibility of padding oracle attacks. The easiest defense is to authenticate every message before decryption. See below.
- ECB encrypts each block of data independently and the same plaintext block will result in the same ciphertext block. Take a look at the ECB encrypted Tux image on the ECB Wikipedia page to see why this is a serious problem. I don't know of any use case where ECB would be acceptable.
- CBC has an IV and thus needs randomness every time a message is encrypted, changing a part of the message requires re-encrypting everything after the change, transmission errors in one ciphertext block completely destroy the plaintext and change the decryption of the next block, decryption can be parallelized / encryption can't, the plaintext is malleable to a certain degree - this can be a problem.
Stream cipher modes: These modes generate a pseudo random stream of data that may or may not depend the plaintext. Similarly to stream ciphers generally, the generated pseudo random stream is XORed with the plaintext to generate the ciphertext. As you can use as many bits of the random stream as you like you don't need padding at all. Disadvantage of this simplicity is that the encryption is completely malleable, meaning that the decryption can be changed by an attacker in any way he likes as for a plaintext p1, a ciphertext c1 and a pseudo random stream r and attacker can choose a difference d such that the decryption of a ciphertext c2=c1⊕d is p2 = p1⊕d, as p2 = c2⊕r = (c1 ⊕ d) ⊕ r = d ⊕ (c1 ⊕ r). Also the same pseudo random stream must never be used twice as for two ciphertexts c1=p1⊕r and c2=p2⊕r, an attacker can compute the xor of the two plaintexts as c1⊕c2=p1⊕r⊕p2⊕r=p1⊕p2. That also means that changing the message requires complete reencryption, if the original message could have been obtained by an attacker. All of the following steam cipher modes only need the encryption operation of the block cipher, so depending on the cipher this might save some (silicon or machine code) space in extremely constricted environments.
- CTR is simple, it creates a pseudo random stream that is independent of the plaintext, different pseudo random streams are obtained by counting up from different nonces/IVs which are multiplied by a maximum message length so that overlap is prevented, using nonces message encryption is possible without per message randomness, decryption and encryption are completed parallelizable, transmission errors only effect the wrong bits and nothing more
- OFB also creates a pseudo random stream independent of the plaintext, different pseudo random streams are obtained by starting with a different nonce or random IV for every message, neither encryption nor decryption is parallelizable, as with CTR using nonces message encryption is possible without per message randomness, as with CTR transmission errors only effect the wrong bits and nothing more
- CFB's pseudo random stream depends on the plaintext, a different nonce or random IV is needed for every message, like with CTR and OFB using nonces message encryption is possible without per message randomness, decryption is parallelizable / encryption is not, transmission errors completely destroy the following block, but only effect the wrong bits in the current block
Disk encryption modes: These modes are specialized to encrypt data below the file system abstraction. For efficiency reasons changing some data on the disc must only require the rewrite of at most one disc block (512 bytes or 4kib). They are out of scope of this answer as they have vastly different usage scenarios than the other. Don't use them for anything except block level disc encryption. Some members: XEX, XTS, LRW.

Authenticated encryption:

To prevent padding oracle attacks and changes to the ciphertext, one can compute a message authentication code (MAC) on the ciphertext and only decrypt it if it has not been tampered with. This is called encrypt-then-mac and should be preferred to any other order. Except for very few use cases authenticity is as important as confidentiality (the latter of which is the aim of encryption). Authenticated encryption schemes (with associated data (AEAD)) combine the two part process of encryption and authentication into one block cipher mode that also produces an authentication tag in the process. In most cases this results in speed improvement.

CCM is a simple combination of CTR mode and a CBC-MAC. Using two block cipher encryptions per block it is very slow.
OCB is faster but encumbered by patents. For free (as in freedom) or non-military software the patent holder has granted a free license, though.
GCM is a very fast but arguably complex combination of CTR mode and GHASH, a MAC over the Galois field with 2^128 elements. Its wide use in important network standards like TLS 1.2 is reflected by a special instruction Intel has introduced to speed up the calculation of GHASH.

Recommendation:

Considering the importance of authentication I would recommend the following two block cipher modes for most use cases (except for disk encryption purposes): If the data is authenticated by an asymmetric signature use CBC, otherwise use GCM.

Android – AES encryption makes different result in iOS and Android

I vaguely recall I had once similar issue of "synchronizing" the encryption between Android and iPhone, and the solution was in proper IV (initialization vector) usage. So probably switching on an explicit IV usage in Android could help:

final byte[] iv = new byte[16];
Arrays.fill(iv, (byte) 0x00);
IvParameterSpec ivParameterSpec = new IvParameterSpec(iv);
.. // the rest of preparations
ecipher.init(Cipher.ENCRYPT_MODE, skeySpec, ivParameterSpec);

Because when on iPhone you pass NULL as the IV, it may internally use a default one that corresponds to the just stated above.

But in production environment you should use a (cryptographically secure pseudo-)random initialization vector, stored together with the data. Then it is safe for all modes of operations. [1]