December 23, 2019
Usually, I'm at the top of the stack when I work with cryptography. I call a well-abstracted library to provide confidentiality, secure storage, and digital signatures to my app. However, I recently worked on an e-commerce project with Paya and had to understand more about AES encryption in order to encrypt my data in Java for their payment processor to work. The company provided examples of their interface with JavaScript and PHP but it took me a while to get an equivalent Java program to work.
It turns out that a special payload is needed to supplement the encrypted data. This is so that the simple two-parameter interface -- key and cleartext / key and cipher text -- is maintained. There's a special construct called an Initialization Vector (IV) used by this flavor of AES (Chiper Block Chaining) that needs to be passed along with the encrypted data. This is a convention added on top of AES.
My plain-Java example in the post was adapted from ArmanSo. This StackOverflow article helped me understand what was going on. This page gave me some starter code. Additional background information was provided here.
The first thing I learned in working with AES is that the password submitted by the caller is usually processed to help fend of weak password requests and to conform to a length required by AES. That is, a password like "mypassword1234567890" (20 bytes) won't work for AES because it requires a length of 16, 24, or 32 bytes. These byte values correspond to the more familiar bit sizes which you'll see in the AES documentation: 128 bit, 192 bit, 256 bit. A hash is employed to stretch a shorter password (and to help with weak passwords). This is followed by a truncation to one of the allowed sizes.
In Java terms, this is implemented using the MessageDigest class followed by an Arrays.copyOf(). I'm using the MD5 algorithm here because that's what is used by CryptoJS. The receiving Paya side needs to use the same algorithm to convert the raw password into the processed password which will feed the encryption algorithm.
This code creates the MessageDigest and truncates the resulting byte[]. It then builds a Key in accordance with the AES algorithm.
byte[] keyBytes = pwd.getBytes("UTF-8");
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
keyBytes = messageDigest.digest(keyBytes);
keyBytes = Arrays.copyOf(keyBytes, 16); // 128-bit
SecretKey secretKey = new SecretKeySpec(keyBytes, "AES");
The truncated bytes must be 16, 24, or 32. If you attempt an arbitrary value, say "17", you'll get this error.
Exception in thread "main" java.security.InvalidKeyException: Invalid AES key length: 17 bytes
An IV is an Initialization Vector. There are two popular types of block encoding algorithms used with AES: Electronic Code Book (ECB) and Cipher Block Chaining (CBC). ECB's algorithm is to process each block individually. It's simple, but patterns in the data can still be interpretted. See this Wiki page and look for the pictures of Tux. You can make out quite a bit of the Tux image with ECB despite it being fully encrypted.
So, randomness is used to further obscure the data. CBC is a mechanism that segments the payload in blocks (as does ECB) but each block's encryption is based on XOR-ing previous block. That is, the second block is encrypted with the data from the first. With the first block however, you need so supplement the payload with a zero-th block. This is where the IV comes in.
The IV is a zero-th block that ought to be generated randomly. Technically, you can use all zeros for the zeroth block, but this compromises the encryption. Additionally, each time you encrypt, you should use a new IV. While any individual payload might not be hackable, if patterns show up because an IV is simple (all zeros) or reused, then multiple payloads could lead to insight into the encrption.
The Java code used to produce an IV is a byte[] filled up with SecureRandom bytes.
byte iv[] = new byte[16];
SecureRandom rnd = new SecureRandom();
rnd.nextBytes(iv);
What's interesting about the IV is that it can be passed along with the payload. From an end user point of view, this is ideal because it means that we're still responsible for a two-parameter call: password and payload. The password will still need the maximum protection, but the IV can just be tacked onto the data and disassembled for the decryption algorithm.
In the case of CryptoJS, it uses an OpenSSL convention that starts the payload with a token "Salted__" follwed by
By Carl Walker
President and Principal Consultant of Bekwam, Inc