Lesson 6: Encryption

Encryption is a pain in the neck, but everyone has secrets and it's human nature to want privacy and confidentiality.

Oct 15, 2023

Today, 8 out of 10 sites on the web use HTTPS by default. Unlike HTTP, which transfers data in plaintext, HTTPS uses encryption for secure communication over the Internet.

Data transmitted and stored in plaintext is susceptible to prying eyes and theft. In our cyber world, sensitive information such as credit cards, passwords, and personal data must be kept private and transferred securely over a network. Encryption is a technique commonly used to protect data in transit and at rest to provide confidentiality and privacy. In this post, we look at symmetric and asymmetric encryption. If you cannot tell the difference between hashing and encryption, I recommend you read the previous post on cryptographic hashing first.

Define encryption

In cryptography, encryption is a process that scrambles human-readable data, known as plaintext, by converting it into unreadable code called ciphertext. Encryption is a bidirectional process, meaning it can be reversed to retrieve the plaintext from the ciphertext. The process of deciphering the encrypted data (ciphertext) into its original form is called decryption. Only authorised parties who possess the decryption key can decipher the code to retrieve the original data. An encryption algorithm, also called a cipher, is an algorithm for encrypting and decrypting data using a cryptographic key. Both encryption and decryption can be defined by the following equations:

ciphertext = encrypt(plaintext, key)
plaintext = decrypt(ciphertext, key)

Symmetric encryption

Symmetric encryption (or secret key encryption) uses a single secret key, also known as a symmetric key, to encrypt and decrypt data. A simple analogy is using a password to lock (encrypt) and unlock (decrypt) a zip file. The secret key is known to both the sender and the recipient.

Data Encryption Standard (DES)

Data Encryption Standard (DES) is an outdated symmetric block cipher that encrypts data in blocks of 64 bits. Its 56-bit key length makes it too insecure for modern applications. The Triple DES (or 3DES) provides more security by using three keys for triple encryption. Unfortunately, it is still found vulnerable to attacks and considered unsafe compared to modern ciphers.

Advanced Encryption Standard (AES)

Advanced Encryption Standard (AES) is a secure symmetric block cipher that encrypts data in blocks of 128 bits. Released in 2001, it is one of the most secure algorithms around and also the industry standard nowadays. AES has 3 different versions: AES-128/AES-192/AES-256 that support three different key lengths: 128/192/256 bits or 16/24/32 bytes.

A block cipher like AES operates on a fixed-length block of bits. For AES, the block size is 128 bits (16 bytes). When encrypting data that is larger than a single block (128 bits), a mode of operation is required to iteratively apply the algorithm to encrypt each block. If the plaintext length is not exactly a multiple of 128 bits, it has to be padded, making the last block of the plaintext a full block size. Note that not all modes of operation require padding. In short, AES uses a block cipher mode to encrypt one block at a time, or block by block. The next section briefly discusses two classic block cipher modes to provide a basic understanding of the subject.

AES-ECB

The Electronic Code Book (ECB) mode is the most basic but also the weakest. It encrypts each block independently and pads a block naively by appending the remaining bytes with a value equal to the number of the padded bytes. If the last block is a full block size, it adds a full block of padding set to the value of 16 (the block size of AES). This allows the decryption to discern the length of padding and remove it to restore the original plaintext. However, the weakness of this approach is that it does not hide the repeating patterns. It is thus semantically insecure. Below is an illustration of the famous ECB penguin that demonstrates ECB is not a recommended mode of encryption as it leaks information.

Figure 2. Tux encrypted with ECB mode (Image taken from Wikipedia)

AES-CBC

The Cipher Block Chaining (CBC) mode also uses padding but takes an additional value called an initialization vector (IV) to randomise the encryption. The IV must be random and unpredictable. Its length is equal to the block size (16 bytes for AES). In this mode, the IV is XOR-ed with the first block of plaintext and then encrypted, and each subsequent block gets XOR-ed with the previous ciphertext block prior to encryption. The IV is required for decryption, so it must be transmitted or stored alongside the ciphertext. It is not a secret and can be left in the clear.

AES-CBC-HMAC

Encryption guarantees the confidentiality of a message, but not the integrity of the ciphertext. Nothing prevents an attacker from modifying or tampering with our ciphertext and IV. The recipient is also not able to verify the ciphertext really originates from the sender. This might seem harmless because a third party (in a man-in-the-middle attack) cannot produce a legitimate ciphertext without knowing the secret key. However, clever attacks exist where attackers deliberately change certain bits in our ciphertext, which would potentially change the meaning of our message. We want to ensure our message is delivered unchanged. To provide authentication and integrity over the ciphertext and IV, we usually use the hash-based message authentication code (HMAC) with the SHA-256 hash function. The AES-CBC-HMAC construction is one of the most widely used authenticated encryption modes. The HMAC is applied on the ciphertext and the IV to create a MAC tag (also known as an authentication tag). The tag has to be sent to the recipient for verification during decryption. We often concatenate the IV, the ciphertext, and the authentication tag and transmit it over the wire.

Enter authenticated encryption with associated data (AEAD)

The AES-CBC-HMAC construction is not very friendly to programmers — it is poorly understood with the IV often misused. For that reason, an all-in-one construction called authenticated encryption with associated data (AEAD) was invented to simplify the use of encryption.

AEAD combines encryption and authentication. In addition to the ciphertext and a nonce, it includes an additional (optional) non-confidential authenticated data (the associated data) which can be left in the clear. This provides a way to authenticate associated data if provided. The authentication tag is calculated based on the data you encrypt and the associated data.

AES with the Galois/Counter mode (AES-GCM) is a widely adopted AEAD. It has been used in several TLS protocol versions. GCM combines the Counter mode and Galois message authentication code (GMAC). In AES-GCM mode, a nonce (12 bytes) is concatenated with a counter (4 bytes) to form a 16-byte block. It is then encrypted with AES to create a keystream. The keystream is then XOR-ed with a plaintext block. The counter is incremented and the process repeats. The keystream will be truncated if it is longer than the plaintext, so it does not require padding to work. Finally, GMAC uses a key to hash the ciphertext and encrypts it to produce an authentication tag. It is always important to know the devil is in the details. One caveat for using AES-GCM is that the nonce must be unique per message for each key used. Accidental reuse of the nonce with a key comprises the security of any messages with the same key and nonce pair. If the same nonce is used, XOR-ing two ciphertexts cancels out the keystream. Now, it only takes an attacker to know one of the plaintexts to compute the other plaintext.

Let’s see how to use AES-GCM in coding.

Python (using pyca/cryptography):

import os
from base64 import b64encode
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

# Associated data
ad = b'user-id=u1301'
# Plaintext to encrypt
data = b'This is a secret message.'

# AES-128 uses 128-bit keys
key = AESGCM.generate_key(bit_length=128)
# GCM uses 12-byte nonces
nonce = os.urandom(12)

# Encryption
cipher = AESGCM(key)
# This returns the ciphertext bytes with the 16-byte tag appended
ciphertext = cipher.encrypt(nonce, data, ad)
print('The ciphertext is: ' + b64encode(ciphertext).decode('utf-8'))

# Decryption
plaintext = cipher.decrypt(nonce, ciphertext, ad)
print('The plaintext was: ' + plaintext.decode('utf-8'))

This will print an output that looks like the following. The ciphertext will be different each time as we use a randomly generated key and nonce.

The ciphertext is: 77aOzN2PDoaHMnra285tTzRtyyMGViCVEWKtXHejSVryoLesl+DAC9I=
The plaintext was: This is a secret message.

Asymmetric encryption

Asymmetric encryption, also known as public key encryption, uses a public key to encrypt and a private key to decrypt data. To use asymmetric encryption, we (recipient) generate a pair of public and private keys. In symmetric encryption, a sender must share the symmetric key with the recipient. In asymmetric encryption, the recipient must not share the private key with the sender. It must be kept secret and only known to the owner. Say we generate a key pair, we only share the public key with our sender and they use our public key to send us encrypted messages. We will then use our private key to decrypt the messages.

Because the messages can only be decrypted using the private key, the public key can be published publicly. It does not disclose any information about the private key. It is (almost) impossible to deduce or derive the private key from a public key. This is possible because of maths. The greatest difference between symmetric and asymmetric is that symmetric encryption manipulates bits while asymmetric encryption relies on computationally difficult maths problems. This keeps the encryption secure but operations are also expensive, making it slower than symmetric encryption. Its limitation is the length of data it can encrypt.

The classic asymmetric cipher is RSA. RSA bases its security on the factorisation problem, which is the difficulty of factoring the product of two large prime numbers. To generate a RSA key, we find a modulus N which is a product of two large prime numbers, p and q , which must remain secret. We also choose a public exponent e (which defaults to 65537 for historical reasons). The private key d is then derived from p, q and e. Because it is computationally hard to find p and q from N given they are large enough, N can be public knowledge. The public key thus consists of the public exponent e and modulus N.

In general, the RSA algorithm works as follows:

ciphertext = message^e mod N
plaintext = ciphertext^d mod N

If you cannot see how this will work on text in place of numbers, think of the plaintext or message as a number (computers see text as binary, so a text is also a number in a computer’s RAM). It is possible to compute the public key from the private key, but not the vice versa. If you want to know all the maths behind RSA, you can read it online. To use RSA securely, we must use padding to make small messages big enough in order to avoid brute force attacks. A secure padding scheme called Optimal Asymmetric Encryption Padding (OAEP) is a strong standard to use for RSA encryption nowadays. RSA-OAEP works by mixing the message with a random number generated per encryption. Internally, OAEP uses a mask generation function (MGF) which is built using a hash function. To decrypt the ciphertext, the process is reversed. RSA-KEM is another scheme that provides stronger security without requiring padding.

Today’s recommendation is to use a minimum of 2048-bit RSA keys. RSA encryption and decryption become slower as the key size increases. 4096-bit RSA keys are also a common option for a practical compromise between performance and security. Over this size, there exist better options such as the Elliptic-curve cryptography (ECC) which is a modern asymmetric cipher (one of the strongest and most efficient algorithms in its category) and generates smaller keys and ciphertext (thus moderately fast).

The maximum size of data that RSA can encrypt is equal to the key size. For example, a 2048-bit RSA key can only encrypt up to 2048 bits = 256 bytes. As a result, it is often not possible to use RSA to encrypt files directly. If we want to encrypt larger data, we have to use hybrid encryption. First, we generate a symmetric key and use symmetric encryption to encrypt the data. We then use RSA to encrypt the symmetric key and transfer both to the recipient. The recipient uses RSA to decrypt the encrypted symmetric key and then uses it to decrypt the ciphertext.

Here’s how to generate a RSA key pair using OpenSSL:

# Generate a 2048-bit RSA private key
$ openssl genrsa -out private-key.pem 2048
# Generate the public key
$ openssl rsa -in private-key.pem -pubout -out public-key.pem

This will output two PEM files in the working directory: private-key.pem containing the private key and public-key.pem containing the public key. They should look something like the following:

This key pair was generated for demo purposes. Remember, do not share your private keys online.

Now, let’s see how to use RSA in coding.

Python (using pyca/cryptography):

from base64 import b64encode
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding

# Generate a RSA private key 
# Note: You can also load a key from a PEM file using load_pem_private_key() 
# and load_pem_public_key() from cryptography.hazmat.primitives.serialization
private_key = rsa.generate_private_key(
    public_exponent=65537,
    key_size=2048
)

# Get the RSA public key from the private key
public_key = private_key.public_key()

# Plaintext to encrypt
message = b'This is a secret message.'

# RSA encryption using a secure padding and hash function
ciphertext = public_key.encrypt(
    message,
    padding.OAEP(
        mgf=padding.MGF1(hashes.SHA256()), 
        algorithm=hashes.SHA256(),
        label=None
    )
)
print('The ciphertext is: ' + b64encode(ciphertext).decode('utf-8'))

# RSA decryption
plaintext = private_key.decrypt(
    ciphertext,
    padding.OAEP(
        mgf=padding.MGF1(algorithm=hashes.SHA256()),
        algorithm=hashes.SHA256(),
        label=None
    )
)
print('The plaintext was: ' + plaintext.decode('utf-8'))

This will print an output that looks like the following:

The ciphertext is: iFowEZV3S40zp91M6KyF3z51FJUCWMFlui1pKlyhW7VnVcMyW8JNCYspB89fzOfmUT4PJOB787v5IgpdlivrSZKbfO5ChAyg+rxZa9Tvgcf7tie0LqV95eyz2kxe5bgi1rBfxQ3b92DmIzOZObJgbdSuD3HDKzc+UyvrVoin6HlBZQnRwLYvfqEtkiM6vZO/PUdr0ariwuLKdkkwX2zEUiKySTvteeUpC85XMI7g0nrgF1boOrn5isbuYU0TX5Q0uAmkosCk8fGVl4bWBPV85iqojkle6cV1+hll98Fsk1gowXjAhoadbPrcPKLrH9jT9ehDMtRoxlxyDIkN8Rp63Q==
The plaintext was: This is a secret message.

In this post, we discussed symmetric and asymmetric encryption as well as what algorithms to use. In summary, symmetric encryption uses the same key to encrypt and decrypt data, making it fast and efficient. It can be used to encrypt data of any length but the secret key must be known to the parties who need to decrypt it. Sharing the key with multiple parties can be risky and generating a key for each client would be a hassle if we need to send and receive messages to and from many people. The keys must be generated beforehand and exchanged over a secure channel. Asymmetric encryption can be used to allow any number of senders to send us messages by using our public key. However, it is slower and its limitation is that it cannot encrypt long messages. As a result, it is often used to exchange a symmetric key between a sender and a recipient. The sender will send over the wire the encrypted symmetric key (using the recipient’s public key) and use the symmetric key to start sending encrypted messages. This is called hybrid encryption. The advantage is that the symmetric key does not have to stored permanently and can be generated for each session.

In my experience, many people tend to be muddled by asymmetric encryption as they are confused by digital signatures and cannot tell which key they should use to encrypt and decrypt. Also, inside a company, people tend to share or circulate private keys around for testing environments. While it makes life easy for everyone, people also have the habit of misplacing keys or storing them carelessly in the clear. For this reason, never use a key that has been used in testing environments for production. Care must be exercised to ensure production keys are stored securely and only accessed by people who have the privilege to manage them. Losing a key or leaking it to the Internet (for example by pasting one into an online tool) is a serious breach of security and could result in great damages.

The Millennial Dev

Lesson 6: Encryption

Encryption is a pain in the neck, but everyone has secrets and it's human nature to want privacy and confidentiality.

Define encryption

Symmetric encryption

Data Encryption Standard (DES)

Advanced Encryption Standard (AES)

AES-ECB

AES-CBC

AES-CBC-HMAC

Enter authenticated encryption with associated data (AEAD)

Asymmetric encryption