Password Hashing

Data in motion is a term used to label any digital information that is being transferred from one location to another. It is also commonly referred to as data in transit or data in flight. When the data is finally contained, saved or stored, it becomes data at rest.

Encryption techniques protect data in motion. Hashing protects data at rest.

Combining these strategies could, in theory, put a strong security boundary around critical assets. But both come with risks and benefits you should know about.

Encryption vs. Hashing

Some people use the terms encryption and hashing simultaneously. While it's true that they're both used to safeguard information, they do so in very different ways.

Consider these basic definitions:

Encryption scrambles data that can be decoded with a key. The intent is to pass the information to another party, and the recipient will use keys to decipher the data. The main example of this is HTTPS with TLSopen in new window, where a certificate is used and installed on a server to allow a client and the server to generate a symmetric keypair for communication.
Hashing also scrambles data, but the intent is to prove its authenticity. Administrators can run a check on hashed data to determine the contents haven't been touched or altered while in storage. No deciphering key exists.

Both methods involve shielding something sensitive from prying eyes. But clearly, they have different goals and core functions.

What is hashing?

Storing passwords securely should be imperative for any credible engineer. Plain text passwords are extremely insecure - you shouldn't even bother considering storing them a plain format. It's enough that someone gains view privileges on a database for an entire userbase to be compromised.

Hashing involves scrambling data at rest to ensure it's not stolen or tampered with. Protection is the goal, but the technique isn't built with decoding in mind.

To prevent anyone from blatantly exploiting login credentials, you should always hash passwords before storing them in a database. That is the simplest, yet most effective way to prevent the unauthorized use of passwords stored in your database. Even if someone gets a hold of users' login credentials, that information can't be used in any shape or form, since the format is unreadable for humans, and hard to crack computationally.

By the way, hash encryption like this doesn't anonymize data, although plenty of people believe that it doesopen in new window. Instead, it's used to protect this data from those who might misuse or alter it.

Additionally - it's a question of ethical conduct. If a user signs up for your website - should you be able to find their password? Passwords are oftentimes used on multiple websites, contain personal information and/or could expose a side of the user that they wouldn't like to put out publicly. Neither you nor a malicious actor should be able to read a plain-text password at any point. This is why websites can't email you your password when you forget it - they don't know it. You have to reset it.

How does hashing work?

In its most basic form, hashing refers to converting one string to another (which is also called a hash) using a hash function. Regardless of the size of an input string, the hash will have a fixed size which is predefined in a hashing algorithm itself. The goal is that the hash doesn't look anything like the input string and that any change in the input string produces a change in the hash.

Additionally - hashing functions hash input in a one-way fashion. It's not a round trip and a hashed password cannot be unhashed. The only way to check whether an input password matches the one in the database is to hash the input password as well, and then compare the hashes. This way, we don't need to know what the actual password is to ascertain whether it's matching the one in the database or not.

Prompt

Every time you put something such as "myPwd" into the hashing algorithm you'll get the same exact output. But, if you change "myPwd" even a bit, the output will be changed beyond recognition.

That ensures that even similar input strings produce completely different hashes. If similar passwords produced the same hashes - cracking one simple password could lead to creating a lookup table for other characters. On the other hand, since the same input always yields the same output, a hashing is pretty predictable.

Brute-forcing hashes

If someone knows what hashing function was used to hash a certain password (and there isn't a large list of hash functions in use), they can crack it by guessing all possible passwords, hashing them with the same hashing function, and comparing obtained hashes to the hash of the password that they want to crack. This type of attack is called a brute-force attack and the attack used to work extremely well for simple passwords, such as password123, 12345678, etc.

The easiest way to prevent brute-force attacks is to use a hashing function that is relatively slow to compute. That way the brute-force attack would take so much time to compute all possible hashes, that it's not even worth trying to perform.

Additionally, most web applications have built-in "timeouts" after a certain number of incorrect passwords were input, making brute-force guessing unviable if someone's trying to brute-force a password through a controlled UI, though, this doesn't hold if someone obtains a local copy of a hashed password.

Salting

Some companies offer further hash strengthening with a technique called salting:

Prompt

Companies that do this:

Add something. This involves adding a string of unique, random characters to the data they must protect.
Hash the whole string. The original data with the salt addition moves through the algorithm.
Store securely. Companies place the salt value on the site, along with the hashed data.
Repeat. Companies can salt data more than once to offer deeper protection.

Salting is most effectiveopen in new window, experts say, when companies use a different salt string for each data point. A password salt, for example, won't be as helpful if each password has the same set of random characters attached. As soon as a hacker figures out that code, all passwords are vulnerable.

Thankfully - the entirety of this logic is typically abstracted away by security frameworks and modules that we can readily use in code.

Common hashing algorithms

All hashing algorithms work in a similar manner. Users input sensitive data, and the system churns through and renders that information ineligible. But not all systems are created equal.

Common older hashing algorithms include:

MD-5. MD5 is simple, quick, and free to use. It's among the most widely used hash algorithms available, but it's also ripe for hacking. Some experts encourage all companies to pick another method to protect data, but they say about a quarter of all major content systems continue to stick with MD5.
Secure Hash Algorithms (SHA). The National Institute of Standards and Technology published the first SHA algorithm in 1993. Each new release is followed by a number, such as SHA-0 and SHA-1. In general, the higher the number, the more secure the algorithm.

There are plenty of cryptographic functions to choose from such as the SHA2 family and the SHA-3 family. However, one design problem with the SHA families is that they were designed to be computationally fast. How fast a cryptographic function can calculate a hash has an immediate and significant bearing on how safe the password is.

Faster calculations mean faster brute-force attacks, for example. Modern hardware in the form of CPUs and GPUs could compute millions, or even billions, of SHA-256 hashes per second against a stolen database. Instead of a fast function, we need a function that is slow at hashing passwords to bring attackers almost to a halt. We also want this function to be adaptive so that we can compensate for future faster hardware by being able to make the function run slower and slower over time.

Bcrypt 🐡

bcrypt was designed by Niels Provos and David Mazières based on the Blowfish cipher 🐡: b for Blowfish and crypt for the name of the hashing function used by the UNIX password system.

You can play with Bcrypt here: https://bcrypt-generator.com/open in new window.

Prompt

Storing Algorithm Settings + Salt + Hash Together

In many applications, frameworks and tools (e.g. in the database of WordPress sites), Bcrypt encrypted passwords are stored together with the algorithm settings and salt, into a single string (in certain format), consisting of several parts, separated by $ character. For example, the password p@ss~123 can be stored in the Bcrypt encrypted format like this (several examples are given, to make the pattern apparent):

$2a$07$wHirdrK4OLB0vk9r3fiseeYjQaCZ0bIeKY9qLsNep/I2nZAXbOb7m
$2a$12$UqBxs0PN/u106Fio1.FnDOhSRJztLz364AwpGemp1jt8OnJYNsr.e
$2a$12$8Ov4lfmZZbv8O5YKrXXCu.mdH9Dq9r72C5GnhVZbGNsIzTr8dSUfm

bcrypt is able to mitigate brute-force attacks by combining the expensive key setup phase of Blowfish with a variable number of iterations to increase the workload and duration of hash calculations. The largest benefit of bcrypt is that, over time, the iteration count can be increased to make it slower allowing bcrypt to scale with computing power.

Argon2 🧪

argon2 is modern ASIC-resistant and GPU-resistant secure key derivation algorithm. It has even better password cracking resistance (when configured correctly) than pbkdf2, bcrypt and scrypt (for similar configuration parameters for CPU and RAM usage).

In addition argon2 can also be used for Proof Of Work calculations, so it is used in cryptocurrencies.

argon2 has several variants:

argon2d – provides strong GPU resistance, but has potential side-channel attacks (possible in very special situations).
argon2i – provides less GPU resistance, but has no side-channel attacks.
argon2id – recommended (combines the argon2d and argon2i).

You can play with the argon2 password to key derivation function online here: https://argon2.online/open in new window.

Prompt

argon2 has the following config parameters:

Password: the password (or message) to be hashed
Salt: random-generated salt (16 bytes recommended for password hashing)
Memory: amount of memory (in kilobytes) to use
Iterations: number of iterations to perform
Hash length: desired number of returned bytes
Parallelism: degree of parallelism (i.e. number of threads)

Storing Algorithm Settings + Salt + Hash Together

In many applications, frameworks and tools, argon2 encrypted passwords are stored together with the algorithm settings and salt, into a single string consisting of several parts, separated by $ character. For example, the password p@ss~123 can be stored in the argon2 standard format like this (several examples are given, to make the pattern apparent):

$argon2d$v=19$m=1024,t=16,p=4$c2FsdDEyM3NhbHQxMjM$2dVtFVPCezhvjtyu2PaeXOeBR+RUZ6SqhtD/+QF4F1o
$argon2d$v=19$m=1024,t=16,p=4$YW5vdGhlcnNhbHRhbm90aGVyc2FsdA$KB7Nj7kK21YdGeEBQy7R3vKkYCz1cdR/I3QcArMhl/Q
$argon2i$v=19$m=8192,t=32,p=1$c21hbGxzYWx0$lmO1aPPy3x0CcvrKpFLi1TL/uSVJ/eO5hPHiWZFaWvY

All the above hashes hold the same password, but with different algotihm settings and different salt.

Applying it to our application

We will be continuing from the application we made at the end of Implementing SQLAlchemy. Which already included database access to save users and items, but did not include any hashing, etc.

If you need a fresh start, you can base your files off of this exampleopen in new window.

Install `passlib`

PassLib is a great Python package to handle password hashes. It supports many secure hashing algorithms and utilities to work with them.

Using this we can add support for the argon2 algorithm while deprecating bcrypt. Meaning that new passwords will be hashed with argon2 but old passwords hashed using bcrypt are still supported.

So, install passlib with bcrypt and argon2:

pip install "passlib[bcrypt,argon2]"

pip install passlib
pip install argon2_cffi
pip install bcrypt

Note

Be sure to also edit your requirements.txt if you have one!

Hash passwords at user creation

Create a new file in your project that will hold everything security related auth.py. Start by importing the tools we need from passlib:

from passlib.context import CryptContext

Create a PassLib "context". This is what will be used to hash and verify passwords. As stated before we are adding support for the argon2 algorithm while deprecating bcrypt:

from passlib.context import CryptContext


pwd_context = CryptContext(schemes=["argon2", "bcrypt"], deprecated="auto")

Create a utility function to hash a password coming from the user.

from passlib.context import CryptContext


pwd_context = CryptContext(schemes=["argon2", "bcrypt"], deprecated="auto")


def get_password_hash(password):
    return pwd_context.hash(password)

Then go to the crud.py file and edit the create_user function to use our new hashing function:

def create_user(db: Session, user: schemas.UserCreate):
    hashed_password = auth.get_password_hash(user.password)
    db_user = models.User(email=user.email, hashed_password=hashed_password)
    db.add(db_user)
    db.commit()
    db.refresh(db_user)
    return db_user

If we now create a user in our application, their password will be hashed using argon2. Below the example from a database where we first created two users using our old unsecured application, then used CryptContext(schemes=["bcrypt"], deprecated="auto") and created a user and lastly used our current set-up to create one:

Prompt

Password Hashing

# Encryption vs. Hashing

# What is hashing?

# How does hashing work?

# Brute-forcing hashes

# Salting

# Common hashing algorithms

# Bcrypt 🐡

# Storing Algorithm Settings + Salt + Hash Together

# Argon2 🧪

# Storing Algorithm Settings + Salt + Hash Together

# Applying it to our application

# Install passlib

# Hash passwords at user creation

Description

Encryption vs. Hashing

What is hashing?

How does hashing work?

Brute-forcing hashes

Salting

Common hashing algorithms

Bcrypt 🐡

Storing Algorithm Settings + Salt + Hash Together

Argon2 🧪

Storing Algorithm Settings + Salt + Hash Together

Applying it to our application

Install `passlib`

Hash passwords at user creation