Introduction
In this section, an introduction from a high level to the relevant cryptography background.
Overview
- 3.1 The Basics
- 3.2 Hash Functions
- 3.3 Asymmetric Encryption
- 3.4 Symmetric Encryption
- 3.5 Digital Signatures
- 3.6 Public Key Infrastructure
[3.1] The Basics
Goal
The goal of cryptography is to allow communication between people or computers.
What is Cryptography Used For?
Cryptography provides the following features:
- Unauthorized third parties can be prevented from reading data (confidentiality).
- Unauthorized third parties can be prevented from changing the data (integrity).
- Unauthorized third parties can be prevented from impersonating somebody else (authenticity).
To guarantee the confidentiality of communication, ciphers are used. A cipher is an algorithm used to encrypt information. Information in plaintext can be given to a cipher to encrypt it with a secret (or a key), creating a cryptogram. The one in possession of a key can take the cryptogram and decrypt it (obtaining plaintext).
Encryption is used to place data in a form (encrypted) that can be reconstituted into its original form (decrypted, plaintext).
Ciphers
Ciphers have diverse practical applications. In 400 BC, Spartans used a cipher device called scytale for military communication, being one of the first uses of cryptography. Many other relatively simple examples can be given: Caesar cipher, the substitution cipher, the Vigener cipher, and the Enigma cipher machine. Julius Caesar used the Ceasar cipher to encrypt messages such that each letter would be shifted X positions ahead (most common, 3 positions). The variable X would correspond to the key of the cipher.
More recently, with the creation of the internet, the need for exchanging information securely appeared. One of the protocols that allowed secure communication of information is TLS. TLS is a cryptographic protocol that provides end-to-end security of data sent between applications over the Internet.
Cipher Types
There are three types of ciphers, depending on the type of key a cipher receives.
Ciphers with No Keys: Hash Functions
Hash functions have no key, they always produce a fixed amount of ciphertext and the plaintext cannot be recovered from the ciphertext.
Ciphers with One Key: Symmetric Encryption
These ciphers are “symmetrical” – if you run plaintext through them you get ciphertext. If you run the ciphertext through them again with the the same key, you get the plaintext back.
Ciphers with Two Keys: Asymmetric
Asymmetric ciphers need two keys (a key pair): a public and a private key. The public key is known by all, while the private key should only be in possession of its creator. Anything enciphered with the first key will be deciphered by using the same algorithm with the second key.
Enhancing Key Management with Hardware Security Modules (HSM)
HSMs are used to create and protect private keys and do all the signing and encryption functions that make use of them. They vary from USB to networked types and are tamper resistant (they destroy themselves if tampered with). There are three big manufacturers in the enterprise space (Thales, NCipher, Gemalto).
Applications
The three types of ciphers allow very useful applications in the context of DLT, such as:
- signing objects (for example transactions)
- verifying that signed objects were signed by a certain entity
- verifying someones’ identity
- assuring data will not change (without everyone knowing so)
In Section 3…
In this section, we are studying the different types of ciphers, and some of their applications, that can guarantee confidentiality and integrity, which are desired properties in any DLT technology.
[3.2] Hash Functions
"To hash" means to scramble. That’s what hash functions do: take any data and output a value with the same length.
What are Hash Functions Used For?
Some examples are as follows:
- To generate session IDs for internet applications and data caching.
- To protect sensitive data like passwords, credit card numbers, and others.
- Creating proofs of the integrity of files, including digital signatures (discussed later in this course).
- Efficiently locating identical data sets.
Hash Function Properties
Not all hash functions can be secure hash functions. To be a secure hash function, the following key properties of the hash function are required:
Pre-Image Resistance (One-Way Function)
One way function means it is easy to calculate an output given an input, but infeasible to calculate the input given the output. In other words, the hash provides no “clues” on what is the input.
Can you guess what is the input corresponding to “aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d” in SHA1? (hint: it is “hello”)
Example: mixing paints. It is easy for someone to predict the output of joining two different paints, but hard to guess the precise originals from the blended color. Another example of a one-way function is breaking an egg. It is easy to break an egg, but infeasible (or at least very hard) to reconstitute it to its original form.
Second Pre-Image Resistance
Second pre-image resistance means that given an original message m it is difficult to find a different input that hash to the same result. In other words, it is different for you to change the value of m such that it results on the same message.
__
Collision Resistance
Collision resistance means it is difficult to find different inputs that hash to the same result (called a collision).
For a hash function to be collision-free, no different inputs can map to the same output. Every input string must generate a unique output. But since anything can be given as an input, there are more inputs than possible outputs. This means that collisions are unavoidable. Cryptographers attempt to design algorithms that avoid such collisions.
An Example of a Hash Function
Hash functions are then mathematical one-way functions, that map any input of any size to a fixed size output. For example, the modulus-X function could be a (bad) hash function, because it only has X possible outputs. The hash function takes a number, divides it by X (in the example X = 30), and takes the remainder as an output (or hashed value). 30 divided by 30 gives remainder 0, 31 divided by 30 gives remainder 1, and so on:
Function Input Output (hash) Mod-30 30 0 Mod-30 31 1 Mod-30 32 2 Mod-30 … … Mod-30 59 59 Mod-30 60 0
In this case, inputs 30 and 60 would form a collision. However, since the number of bits (or length) of a hash determines how many possible hashes there are, the higher the number, the less likely someone will find a collision. Let us see real-world hash functions:
Algorithm Bits Input Output (hash) md5 128 hello 5d41402abc4b2a76b9719d911017c592 md5 128 hello. d94c10e437d18531e122ed0b45badd2a SHA-256 256 hello 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 SHA-256 256 hello. 1589999b0ca6ef8814283026a9f166d51c70a910671c3d44049755f07f2eb910 SHA3-512 512 hello 75d527c368f2efe848ecf6b073a36767800805e9eef2b1857d5f984f036eb6df891d75f72d9b154518c1cd58835286d1da9a38deba3de98b5a53e5ed78a84976 SHA3-512 512 hello. a1a87c3f01f6739912d8e0ccf1d2994db0c4334be2b59d453bd50f2a1f9a6bbc4b209400a31f0de16b31f81213bba32e1536c4c54d88a543b09c486e1822e7ef
Summary
That’s it! These are the basics of hash functions. You can experiment with them in this playground. Remember:
Hash functions are one-way functions, meaning it is difficult to find an input given its output. It will be difficult to find the input that hashes to (using SHA-256): “c6f81db0e9f8206c971c9e5826e3ba823ffbb1a3a900f8047652a8bf78ea98fdfc745855a3853a635675458eb6d1aaf1209e88ead2d192382b5c4cbdd6850e02”. Hash functions are supposed to avoid collisions. It will be difficult to find an input different than “hello” that hashes (using SHA-256) to “9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043”.
[3.3] Asymmetric Encryption
Asymmetric encryption uses a pair of keys to perform encryption and decryption: a public key and a private key (or secret key).
Typically, the public key is used to encrypt data, and the private key to decrypt data. However, the private key can also be used to encrypt data - in which case the public key would be used to decrypt it. It is important to keep the private key safe. If an attacker obtains the private key, impersonation
can happen (essentially, two parties are using the same private key, so it is difficult to know who used it).
The public key can be accessed and seen by anyone: there is no need to assure the confidentiality of the public key. The party that wants to communicate needs to publish the public key (ensuring its integrity and authenticity). There is no need to distribute keys carefully, like in symmetric encryption.
The following image shows how Alice can send an encrypted message to Bob using public-private cryptography. First, Alice retrieves Bob’s public key (can be published on the Internet, in the open). Then, Alice encrypts her message M with Bob’s public key, generating a cryptogram. That cryptogram is then sent to Bob over the Internet. Bob retrieves the cryptogram. After that, he can use his private key to decrypt it, revealing the original message M.
asymmetric encryption (See here)
Typical examples of asymmetric cryptography include elliptic curve algorithms, Diffie-Hellman, and RSA. In particular, some elliptic curve ciphers are used to generate public-private keypairs for blockchain accounts, RSA is used to secure internet communications, and Diffie-Hellman to perform secure key exchanges.
Most ciphers are not totally safe, as an attacker can brute force them (discovering the private key from the public key). However, in practice, we can consider those safe because it would take an attacker to have an enormous computational power to break the scheme.
Summary
- In asymmetric encryption, two keys exist (a key pair): a public key (everyone sees it) and a private key (only the owner should have it).
- Anyone can ensure confidentiality of a message by using encryption with the recipient’s public key. The recipient can decrypt the cryptogram with its private key.
- However, the owner can also encrypt messages with its private key (used for creating Digital Signatures, to be studied in 3.5).
[3.4] Symmetric Encryption
In symmetric cryptography, a single key is used to cipher and decipher the message. When the cipher is run on a message, a cryptogram is obtained. When the cipher is run on the cryptogram, you will get the plain message. The following figure illustrates this process.
symmetric encryption (See here)
Symmetric cryptography ciphers includes the following algorithms: the Data Encryption Standard (DES), Triple DES, Twofish, the Advanced Encryption Standard (AES) and the Tiny Encryption Algorithm (TEA). These algorithms are typically more efficient than asymmetric encryption ones (but performance varies widely from algorithm to algorithm).
However, key exchange is a challenge: the party generating the key must deliver it confidentially. Since there is no shared secret, the key cannot be encrypted and sent over the internet. Typically, one would share the symmetric key with asymmetric encryption (the person wanting to share a symmetric key would encrypt it with the recipient’s public key) or by using Diffie-Hellman (a key exchange protocol
- allowing parties to agree on a shared key).
Summary
- In symmetric encryption, only one key exists to encrypt data.
- Sharing a symmetric key securely needs to be done carefully (using asymmetric encryption or key exchange protocols). Otherwise, an attacker could intercept the symmetric key and use it to decrypt messages.
[3.5] Digital Signatures
Digital signatures are used to confirm a message actually came from who it says it does and that it hasn’t been tampered with in transit (assuring integrity). Signing protects trillions of pounds worth of payments every day.
Digital signatures utilize several techniques, such as asymmetric encryption and hashing to deliver a message with integrity, authenticity, and non-repudiation. Integrity means that no one can change the contents of the message without others knowing about it. Authenticity allows us to uniquely identify the subject that signed the message. Non-repudiation means the creator of the message is assumed to be the one who created the message. In other words, the person signing a message cannot deny the signature over that message.
Imagine that Alice wants to send a piece of data to Bob over the internet. The following figure shows how Alice can create a simple, digitally signed document.
digital signature creation (See here)
Alice first creates a message (step 1) and calculates its digest (or hash) using a hash function (step 2). After that, she encrypts the hash using a cipher (step 3)—finally, the document to be sent consists of the original message and an encrypted digest (step 4). The combination of the encrypted digest and the original document can be sent over the internet.
How can Bob verify that the information coming from Alice is not tampered with?
Bob retrieves the message and encrypted hash. Bob then decrypts the digest of the message with Alice’s public key. If the decryption is successful, Bob knows that Alice signed the message (authenticity and non-repudiation), assuming that only Alice holds her private keys. The following figure shows this procedure:
digital signature verification (See here)
After that, Bob recalculates the message hash and compares it with the digest. If they are the same, the information did not change - the message was not tampered with and was signed by Alice.
Summary
- Digital signatures rely on hash functions and asymmetric encryption: hash functions create a digest of the message to be sent, and encryption allows the recipient to be certified that the digest comes from the sender.
- Anyone can verify the digitally signed message by recalculating the hash of the message and comparing it to the decrypted digest.
[3.6] Public Key Infrastructure
Why do we need Public Key Infrastructure (PKI)?
Much value is protected by asymmetric cryptography (also called public-key cryptography). This leads to attackers trying to steal private keys. Alternatively attackers could generate a keypair, and make a victim think the attacker’s public key belongs to someone else. The latter attack can lead to what is known as a man-in-the-middle attack.
It is then imperative to solve these problems. Securing private keys is often difficult in the long term, as there are a wide variety of attacks that well-motivated hackers can try to execute. How to solve the latter problem? How can we trust that public keys really belong to who we think they do, and how do we know they haven’t been stolen (compromised)?
We do this with “certificates” (or certs), that leverage public-key infrastructure. Certificates include public keys, wrapped in signatures that show us who trusts them, and for what purpose.
PKI is widely used by browsers for clients to establish secure connections to servers. Typically browsers offer an easy way to examine certificates (the little and hopefully green padlock near the URL field). The next figure illustrates the certificate associated with the developer.quant.network domain:
quant dev portal cert (See here)
One can see that the certificate was issued to developer.quant.network, by Amazon, to be expired in February 2022. The certificate hash is also present for verification purposes (called the fingerprint, coming with different hashing algorithms).
Certificates can be created by following the steps (as shown in this Figure)
The user (you) creates a new keypair (public key and private key). After that, we wrap the public key in a message called a Certificate Signing Request (CSR). This message contains assertions: what is the public key, what is the purpose of the certificate (payments, email, networks, etc), the expiry date (6-24 months, typically). This corresponds to step 1 of the figure.
After that, the CSR is sent to a certificate authority (CA) that can certify the message, for example Amazon, DigiCert, Comodo, etc (step 2).
For this, the CA runs a Know-Your-Customer process on us, and if it believes we are who we say we are, it signs our CSR with its private key - and gives us a certificate (step 3). When we sign messages, we can now use the keys from the certificate signed by the CA.
The receiver can then use the CA’s public key to validate the certificate. If the certificate is valid, it means the CA recognizes the included public key as ours. The verification of a message can now occur with the public key within the certificate (see the Digital Signatures section 3.5).
This strategy solves the second problem: how do you make sure you are communicating with whom you think you are communicating?. However, what happens if that party looses their private keys to an attacker (first problem)? Then, in that case, we need to be able to revoke certificates.
Handling Revoked Certificates
Firstly, the organisation needs to realize it needs to revoke a certificate. Either because the associated private key has been exposed or due to key rotation best practices. To revoke a certificate they need to inform the CA.
The CA keeps a list of revoked certs, imaginatively called a Certificate Revocation List (CRL). The CRL has to be downloaded regularly, so that messages received can be verified correctly.
For highly secure systems, the worry is that certs revoked today might not be on the CRL yet. Also organisations may not want to perform the CRL check themselves and instead outsource it to someone else who has agreed to perform this important responsibility. For either of these cases, the CA can run an API that message receivers call to check the status of a cert before using it.
revoked certificate checking process (See here)
Checking the status of a cert online is done via the Online Certificate Status Protocol (OCSP), against OCSP servers. OCSP checks can take up to 20 seconds, so aren’t good for real-time applications.
Summary
- PKI infrastructure allows certificate authorities to issue certificates to requesters.
- Certificates provide an extra layer of security, by asserting that a public key belongs to a certain identity.
- Message senders can include the required certificates along with a message (and a signature of the message), so that the receiver can use the sender’s public key to decrypt the message.