Lesson 6: Encryption
Encryption is a pain in the neck, but everyone has secrets and it's human nature to want privacy and confidentiality.
Today, 8 out of 10 sites on the web use HTTPS by default. Unlike HTTP, which transfers data in plaintext, HTTPS uses encryption for secure communication over the Internet.
Data transmitted and stored in plaintext is susceptible to prying eyes and theft. In our cyber world, sensitive information such as credit cards, passwords, and personal data must be kept private and transferred securely over a network. Encryption is a technique commonly used to protect data in transit and at rest to provide confidentiality and privacy. In this post, we look at symmetric and asymmetric encryption. If you cannot tell the difference between hashing and encryption, I recommend you read the previous post on cryptographic hashing first.
Define encryption
In cryptography, encryption is a process that scrambles human-readable data, known as plaintext, by converting it into unreadable code called ciphertext. Encryption is a bidirectional process, meaning it can be reversed to retrieve the plaintext from the ciphertext. The process of deciphering the encrypted data (ciphertext) into its original form is called decryption. Only authorised parties who possess the decryption key can decipher the code to retrieve the original data. An encryption algorithm, also called a cipher, is an algorithm for encrypting and decrypting data using a cryptographic key. Both encryption and decryption can be defined by the following equations:
ciphertext = encrypt(plaintext, key)
plaintext = decrypt(ciphertext, key)
Symmetric encryption
Symmetric encryption (or secret key encryption) uses a single secret key, also known as a symmetric key, to encrypt and decrypt data. A simple analogy is using a password to lock (encrypt) and unlock (decrypt) a zip file. The secret key is known to both the sender and the recipient.
Data Encryption Standard (DES)
Data Encryption Standard (DES) is an outdated symmetric block cipher that encrypts data in blocks of 64 bits. Its 56-bit key length makes it too insecure for modern applications. The Triple DES (or 3DES) provides more security by using three keys for triple encryption. Unfortunately, it is still found vulnerable to attacks and considered unsafe compared to modern ciphers.
Advanced Encryption Standard (AES)
Advanced Encryption Standard (AES) is a secure symmetric block cipher that encrypts data in blocks of 128 bits. Released in 2001, it is one of the most secure algorithms around and also the industry standard nowadays. AES has 3 different versions: AES-128/AES-192/AES-256 that support three different key lengths: 128/192/256 bits or 16/24/32 bytes.
A block cipher like AES operates on a fixed-length block of bits. For AES, the block size is 128 bits (16 bytes). When encrypting data that is larger than a single block (128 bits), a mode of operation is required to iteratively apply the algorithm to encrypt each block. If the plaintext length is not exactly a multiple of 128 bits, it has to be padded, making the last block of the plaintext a full block size. Note that not all modes of operation require padding. In short, AES uses a block cipher mode to encrypt one block at a time, or block by block. The next section briefly discusses two classic block cipher modes to provide a basic understanding of the subject.
AES-ECB
The Electronic Code Book (ECB) mode is the most basic but also the weakest. It encrypts each block independently and pads a block naively by appending the remaining bytes with a value equal to the number of the padded bytes. If the last block is a full block size, it adds a full block of padding set to the value of 16 (the block size of AES). This allows the decryption to discern the length of padding and remove it to restore the original plaintext. However, the weakness of this approach is that it does not hide the repeating patterns. It is thus semantically insecure. Below is an illustration of the famous ECB penguin that demonstrates ECB is not a recommended mode of encryption as it leaks information.
AES-CBC
The Cipher Block Chaining (CBC) mode also uses padding but takes an additional value called an initialization vector (IV) to randomise the encryption. The IV must be random and unpredictable. Its length is equal to the block size (16 bytes for AES). In this mode, the IV is XOR-ed with the first block of plaintext and then encrypted, and each subsequent block gets XOR-ed with the previous ciphertext block prior to encryption. The IV is required for decryption, so it must be transmitted or stored alongside the ciphertext. It is not a secret and can be left in the clear.
AES-CBC-HMAC
Encryption guarantees the confidentiality of a message, but not the integrity of the ciphertext. Nothing prevents an attacker from modifying or tampering with our ciphertext and IV. The recipient is also not able to verify the ciphertext really originates from the sender. This might seem harmless because a third party (in a man-in-the-middle attack) cannot produce a legitimate ciphertext without knowing the secret key. However, clever attacks exist where attackers deliberately change certain bits in our ciphertext, which would potentially change the meaning of our message. We want to ensure our message is delivered unchanged. To provide authentication and integrity over the ciphertext and IV, we usually use the hash-based message authentication code (HMAC) with the SHA-256 hash function. The AES-CBC-HMAC construction is one of the most widely used authenticated encryption modes. The HMAC is applied on the ciphertext and the IV to create a MAC tag (also known as an authentication tag). The tag has to be sent to the recipient for verification during decryption. We often concatenate the IV, the ciphertext, and the authentication tag and transmit it over the wire.
Enter authenticated encryption with associated data (AEAD)
The AES-CBC-HMAC construction is not very friendly to programmers — it is poorly understood with the IV often misused. For that reason, an all-in-one construction called authenticated encryption with associated data (AEAD) was invented to simplify the use of encryption.
AEAD combines encryption and authentication. In addition to the ciphertext and a nonce, it includes an additional (optional) non-confidential authenticated data (the associated data) which can be left in the clear. This provides a way to authenticate associated data if provided. The authentication tag is calculated based on the data you encrypt and the associated data.
AES with the Galois/Counter mode (AES-GCM) is a widely adopted AEAD. It has been used in several TLS protocol versions. GCM combines the Counter mode and Galois message authentication code (GMAC). In AES-GCM mode, a nonce (12 bytes) is concatenated with a counter (4 bytes) to form a 16-byte block. It is then encrypted with AES to create a keystream. The keystream is then XOR-ed with a plaintext block. The counter is incremented and the process repeats. The keystream will be truncated if it is longer than the plaintext, so it does not require padding to work. Finally, GMAC uses a key to hash the ciphertext and encrypts it to produce an authentication tag. It is always important to know the devil is in the details. One caveat for using AES-GCM is that the nonce must be unique per message for each key used. Accidental reuse of the nonce with a key comprises the security of any messages with the same key and nonce pair. If the same nonce is used, XOR-ing two ciphertexts cancels out the keystream. Now, it only takes an attacker to know one of the plaintexts to compute the other plaintext.
Let’s see how to use AES-GCM in coding.
Python (using pyca/cryptography):
import os
from base64 import b64encode
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
# Associated data
ad = b'user-id=u1301'
# Plaintext to encrypt
data = b'This is a secret message.'
# AES-128 uses 128-bit keys
key = AESGCM.generate_key(bit_length=128)
# GCM uses 12-byte nonces
nonce = os.urandom(12)
# Encryption
cipher = AESGCM(key)
# This returns the ciphertext bytes with the 16-byte tag appended
ciphertext = cipher.encrypt(nonce, data, ad)
print('The ciphertext is: ' + b64encode(ciphertext).decode('utf-8'))
# Decryption
plaintext = cipher.decrypt(nonce, ciphertext, ad)
print('The plaintext was: ' + plaintext.decode('utf-8'))
This will print an output that looks like the following. The ciphertext will be different each time as we use a randomly generated key and nonce.
The ciphertext is: 77aOzN2PDoaHMnra285tTzRtyyMGViCVEWKtXHejSVryoLesl+DAC9I=
The plaintext was: This is a secret message.
Asymmetric encryption
Asymmetric encryption, also known as public key encryption, uses a public key to encrypt and a private key to decrypt data. To use asymmetric encryption, we (recipient) generate a pair of public and private keys. In symmetric encryption, a sender must share the symmetric key with the recipient. In asymmetric encryption, the recipient must not share the private key with the sender. It must be kept secret and only known to the owner. Say we generate a key pair, we only share the public key with our sender and they use our public key to send us encrypted messages. We will then use our private key to decrypt the messages.
Because the messages can only be decrypted using the private key, the public key can be published publicly. It does not disclose any information about the private key. It is (almost) impossible to deduce or derive the private key from a public key. This is possible because of maths. The greatest difference between symmetric and asymmetric is that symmetric encryption manipulates bits while asymmetric encryption relies on computationally difficult maths problems. This keeps the encryption secure but operations are also expensive, making it slower than symmetric encryption. Its limitation is the length of data it can encrypt.
The classic asymmetric cipher is RSA. RSA bases its security on the factorisation problem, which is the difficulty of factoring the product of two large prime numbers. To generate a RSA key, we find a modulus N
which is a product of two large prime numbers, p
and q
, which must remain secret. We also choose a public exponent e
(which defaults to 65537 for historical reasons). The private key d
is then derived from p
, q
and e
. Because it is computationally hard to find p
and q
from N
given they are large enough, N
can be public knowledge. The public key thus consists of the public exponent e
and modulus N
.
In general, the RSA algorithm works as follows:
ciphertext = message^e mod N
plaintext = ciphertext^d mod N
If you cannot see how this will work on text in place of numbers, think of the plaintext or message as a number (computers see text as binary, so a text is also a number in a computer’s RAM). It is possible to compute the public key from the private key, but not the vice versa. If you want to know all the maths behind RSA, you can read it online. To use RSA securely, we must use padding to make small messages big enough in order to avoid brute force attacks. A secure padding scheme called Optimal Asymmetric Encryption Padding (OAEP) is a strong standard to use for RSA encryption nowadays. RSA-OAEP works by mixing the message with a random number generated per encryption. Internally, OAEP uses a mask generation function (MGF) which is built using a hash function. To decrypt the ciphertext, the process is reversed. RSA-KEM is another scheme that provides stronger security without requiring padding.
Today’s recommendation is to use a minimum of 2048-bit RSA keys. RSA encryption and decryption become slower as the key size increases. 4096-bit RSA keys are also a common option for a practical compromise between performance and security. Over this size, there exist better options such as the Elliptic-curve cryptography (ECC) which is a modern asymmetric cipher (one of the strongest and most efficient algorithms in its category) and generates smaller keys and ciphertext (thus moderately fast).
The maximum size of data that RSA can encrypt is equal to the key size. For example, a 2048-bit RSA key can only encrypt up to 2048 bits = 256 bytes. As a result, it is often not possible to use RSA to encrypt files directly. If we want to encrypt larger data, we have to use hybrid encryption. First, we generate a symmetric key and use symmetric encryption to encrypt the data. We then use RSA to encrypt the symmetric key and transfer both to the recipient. The recipient uses RSA to decrypt the encrypted symmetric key and then uses it to decrypt the ciphertext.
Here’s how to generate a RSA key pair using OpenSSL:
# Generate a 2048-bit RSA private key
$ openssl genrsa -out private-key.pem 2048
# Generate the public key
$ openssl rsa -in private-key.pem -pubout -out public-key.pem
This will output two PEM files in the working directory: private-key.pem
containing the private key and public-key.pem
containing the public key. They should look something like the following:
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA6VNWGmvC5LMTXtO0QNr3snVTSOJrIah5Y5j15xyr8M9eH74U
VUza3o6qRJg3PM9TvslLEKY1Zf4gQPbEYixvZJJoGCOhweh9CrqAj2fzVpcaYrCF
ClDVwWpJeyXNKcS3ObKwcMs4KWWqob88d3vkILhfkoX+ac6atlnHL1LMkJf8ggkn
zMqcR3eogWmOO/NuKL90uNNLhpTzd1qrgK7xuRRnu9Nd5sawY9oFqb+d0wRcUN4G
HWxkhHeWyux0lboMomqLDDrGqo2uqjC9JMO6VGPo0TbYSmicfeXsfbqZRoeVvs/1
FITZMvoSxTX4wo2Rrq9mendNEc9Djhy53afU9wIDAQABAoIBAFthQxCX8b1mEQkL
esYHvAjNgG/EFVcaR2hlaLE5/nESlfdylz4NGo8darvwrXmIbXEEHv0HS1SFoZYv
zxvv8TR/TntdwSVTa11/S4hemuPny/Ko1YIDxKO6f8rWNDLOkz/qpsWWIYm9AoXU
gtb804ypCO02wzwnKVqPcL4s/GcID+6zj+AT5fYjWagJl0KfwdlBINnZHTeO4VKQ
QAKNlMuzxcPTLvEMNYOG2V394H9YItKh0TIn1+u9+QIiE715e7oLTCIQX91wLMVe
HVmTlXKTo1MRlC9Fe446pN/yZoixT1lqLGVDEHq34NXm4tKraWr/22ut8z0zuZDD
v5Rz4ekCgYEA9zVh+Vmn4TMKbtIiwQuKzAlXAUyt5PxPD3Ag85eNRquEi38FFh9u
vpR7XiYoKJI30J64iRBvR7uvyNRJa/DYoigMumMgPV4L2jwsSJIMBjiAvoVDLaYq
BEO7Wcg1omPiBFrBxbPCBOVFikijDPxuvoPZXFEyhGdJYDxUKwsgcQMCgYEA8Z+P
pbOE+SgujSmN17WcJMzxrj+sPznuq5JQ0nusDXFQsOKP/ug3S15INtIQVjZPZD3Q
gzmPya0LTOZwOl5QH8nHRG8XJVow4SvFnNc/t6NaGtwqVLpZmH/XihNkFexVPN1k
gtNBvWAu6tTYfd2yRKLhEQ7LVSCBLDGf3GhFt/0CgYEAkp6Qy2mHjqPNLkln34NP
ARERD25BPS0AXzGr+Y4LdrzH0ky14ZcnAdXjDcYnz1hZzlw1KuYaejsxWTW/jku8
0QBb/DhKqNscwIUr5qbohtAAW/+CBpMlHH5noiqC1RvUs6x4fR/OlUS+Z/QI8OzP
aiOdSYnHIox4EqH1ccoZpa0CgYArtB65bAUrQ/dXlSKQ18qMZX15dQ7kyMfAxNBV
ogT20X404GYHR11pBn6tW8WUsnIdwYiLk8fMRL58hFncVN7NQSQH3sgi+3NH5zDx
M4XU43kSzqvhc2ttSAJmeSdrR7oLLkhV2XxUkqcp1qHp8kWiYIuxGCnzFdQHeFpf
9YRWyQKBgQCmZCANBspBDwm31eEuCuMELicuCLrpv9lFhM8ZNa+ea1DGntzz3IWi
SNjm73/G5W+kq4hDYAXQPZgdo5yfMMZmI6GCRn0k4ejVpu5HKglN7EIS8VZzAQwQ
C9ic1cESSpnRbi8Tjta7TnqwhnAwTGSP93tGct8JWmzv2iOlRiu4YA==
-----END RSA PRIVATE KEY-----
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA6VNWGmvC5LMTXtO0QNr3
snVTSOJrIah5Y5j15xyr8M9eH74UVUza3o6qRJg3PM9TvslLEKY1Zf4gQPbEYixv
ZJJoGCOhweh9CrqAj2fzVpcaYrCFClDVwWpJeyXNKcS3ObKwcMs4KWWqob88d3vk
ILhfkoX+ac6atlnHL1LMkJf8ggknzMqcR3eogWmOO/NuKL90uNNLhpTzd1qrgK7x
uRRnu9Nd5sawY9oFqb+d0wRcUN4GHWxkhHeWyux0lboMomqLDDrGqo2uqjC9JMO6
VGPo0TbYSmicfeXsfbqZRoeVvs/1FITZMvoSxTX4wo2Rrq9mendNEc9Djhy53afU
9wIDAQAB
-----END PUBLIC KEY-----
This key pair was generated for demo purposes. Remember, do not share your private keys online.
Now, let’s see how to use RSA in coding.
Python (using pyca/cryptography):
from base64 import b64encode
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import padding
# Generate a RSA private key
# Note: You can also load a key from a PEM file using load_pem_private_key()
# and load_pem_public_key() from cryptography.hazmat.primitives.serialization
private_key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048
)
# Get the RSA public key from the private key
public_key = private_key.public_key()
# Plaintext to encrypt
message = b'This is a secret message.'
# RSA encryption using a secure padding and hash function
ciphertext = public_key.encrypt(
message,
padding.OAEP(
mgf=padding.MGF1(hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
print('The ciphertext is: ' + b64encode(ciphertext).decode('utf-8'))
# RSA decryption
plaintext = private_key.decrypt(
ciphertext,
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
print('The plaintext was: ' + plaintext.decode('utf-8'))
This will print an output that looks like the following:
The ciphertext is: iFowEZV3S40zp91M6KyF3z51FJUCWMFlui1pKlyhW7VnVcMyW8JNCYspB89fzOfmUT4PJOB787v5IgpdlivrSZKbfO5ChAyg+rxZa9Tvgcf7tie0LqV95eyz2kxe5bgi1rBfxQ3b92DmIzOZObJgbdSuD3HDKzc+UyvrVoin6HlBZQnRwLYvfqEtkiM6vZO/PUdr0ariwuLKdkkwX2zEUiKySTvteeUpC85XMI7g0nrgF1boOrn5isbuYU0TX5Q0uAmkosCk8fGVl4bWBPV85iqojkle6cV1+hll98Fsk1gowXjAhoadbPrcPKLrH9jT9ehDMtRoxlxyDIkN8Rp63Q==
The plaintext was: This is a secret message.
In this post, we discussed symmetric and asymmetric encryption as well as what algorithms to use. In summary, symmetric encryption uses the same key to encrypt and decrypt data, making it fast and efficient. It can be used to encrypt data of any length but the secret key must be known to the parties who need to decrypt it. Sharing the key with multiple parties can be risky and generating a key for each client would be a hassle if we need to send and receive messages to and from many people. The keys must be generated beforehand and exchanged over a secure channel. Asymmetric encryption can be used to allow any number of senders to send us messages by using our public key. However, it is slower and its limitation is that it cannot encrypt long messages. As a result, it is often used to exchange a symmetric key between a sender and a recipient. The sender will send over the wire the encrypted symmetric key (using the recipient’s public key) and use the symmetric key to start sending encrypted messages. This is called hybrid encryption. The advantage is that the symmetric key does not have to stored permanently and can be generated for each session.
In my experience, many people tend to be muddled by asymmetric encryption as they are confused by digital signatures and cannot tell which key they should use to encrypt and decrypt. Also, inside a company, people tend to share or circulate private keys around for testing environments. While it makes life easy for everyone, people also have the habit of misplacing keys or storing them carelessly in the clear. For this reason, never use a key that has been used in testing environments for production. Care must be exercised to ensure production keys are stored securely and only accessed by people who have the privilege to manage them. Losing a key or leaking it to the Internet (for example by pasting one into an online tool) is a serious breach of security and could result in great damages.