John the Ripper and Key Derivation Functions (KDFs)

Kali LinuxBeginner
Practice Now

Introduction

In this lab, you will explore Key Derivation Functions (KDFs), which are cryptographic algorithms designed to make brute-force attacks on passwords more difficult. Unlike simple hashing functions, KDFs intentionally introduce computational cost, making it much slower to test many password guesses. You will learn about common KDFs such as PBKDF2, bcrypt, and scrypt, understand how they generate unique password hashes, and observe how a powerful password cracking tool like John the Ripper interacts with these functions. Finally, you will gain practical experience in implementing KDFs for secure password storage, reinforcing best practices in cybersecurity.

Understand KDFs (e.g., PBKDF2, bcrypt, scrypt)

In this step, you will gain a foundational understanding of Key Derivation Functions (KDFs). KDFs are cryptographic algorithms that derive one or more secret keys from a secret value such as a master key or a password. Their primary purpose in password storage is to make brute-force and dictionary attacks computationally expensive, even if an attacker obtains the hashed passwords. This is achieved by intentionally slowing down the hashing process through iterative computations and the use of a "salt."

A "salt" is a random string of data that is unique for each password. When a password is hashed, the salt is combined with the password before hashing. This prevents attackers from using pre-computed rainbow tables to crack passwords and ensures that two identical passwords will produce different hashes if their salts are different.

Let's explore some common KDFs:

  • PBKDF2 (Password-Based Key Derivation Function 2): This function applies a pseudorandom function (like HMAC-SHA256) to the input password along with a salt and repeats the process many times (iterations) to increase the computational cost.
  • bcrypt: Based on the Blowfish cipher, bcrypt is designed to be adaptive, meaning its computational cost can be increased over time to keep up with increasing processor speeds. It's known for its resistance to GPU-based attacks.
  • scrypt: This KDF was specifically designed to be resistant to hardware attacks (ASICs and FPGAs) by requiring significant amounts of memory, in addition to computational power.

To begin, let's ensure John the Ripper is installed, as we will use it to demonstrate KDFs later.

john --version

You should see output similar to this, indicating John the Ripper is installed:

John the Ripper password cracker, version 1.9.0-jumbo-1+bleeding-e7022e5 64-bit
Copyright (c) 1996-2020 by Solar Designer
...

Next, let's create a simple Python script to demonstrate how PBKDF2 can be used to hash a password.

Create a file named kdf_demo.py:

nano kdf_demo.py

Add the following Python code to the file:

import hashlib
import os

def pbkdf2_hash(password, salt=None, iterations=100000):
    if salt is None:
        salt = os.urandom(16) ## Generate a random 16-byte salt

    ## PBKDF2 with HMAC-SHA256
    hashed_password = hashlib.pbkdf2_hmac(
        'sha256',
        password.encode('utf-8'),
        salt,
        iterations
    )
    return salt, hashed_password

password = "mysecretpassword"
salt, hashed = pbkdf2_hash(password)

print(f"Password: {password}")
print(f"Salt (hex): {salt.hex()}")
print(f"Hashed Password (hex): {hashed.hex()}")
print(f"Iterations: 100000")

Save the file by pressing Ctrl+X, then Y, then Enter.

Now, run the Python script:

python3 kdf_demo.py

You will see output showing the original password, the generated salt, and the PBKDF2 hash. Notice that the salt is a random hexadecimal string, and the hash is also a hexadecimal string. Each time you run the script, a new salt will be generated, resulting in a different hash for the same password.

Password: mysecretpassword
Salt (hex): <random_hex_string>
Hashed Password (hex): <random_hex_string>
Iterations: 100000

This demonstrates the core concept of KDFs: combining a password with a unique salt and applying many iterations of a hashing function to produce a computationally expensive hash.

Identify Hashes Generated by KDFs

In this step, you will learn to identify the format of hashes generated by different KDFs. While the raw output of a KDF might be a binary string, when stored in systems, they are often encoded (e.g., Base64 or hexadecimal) and prefixed with identifiers that indicate the KDF used, the salt, and sometimes the iteration count or cost factor. This standardized format allows tools like John the Ripper to recognize and process them correctly.

Let's look at common hash formats for PBKDF2, bcrypt, and scrypt.

PBKDF2: PBKDF2 hashes often appear in a format that includes the algorithm, iterations, salt, and the derived key. For example, in /etc/shadow files on Linux, PBKDF2 (specifically SHA512) hashes might look like this: $6$rounds=5000$<salt>$<hash> Here, $6$ indicates SHA-512, rounds= specifies iterations, followed by the salt and the actual hash.

bcrypt: bcrypt hashes are easily recognizable by their $2a$, $2b$, or $2y$ prefix, followed by the cost factor (e.g., 10), the salt, and the hash. Example: $2a$10$<salt><hash>

scrypt: scrypt hashes typically start with $7$ or $scrypt$, followed by parameters like ln, r, p (logarithmic cost, block size, and parallelization factor), the salt, and the hash. Example: $7$<ln>$<r>$<p>$<salt><hash>

To demonstrate, let's create a file containing a few example KDF hashes. We will use a tool called mkpasswd (which is part of the whois package) to generate a bcrypt hash, and then manually construct a PBKDF2 hash for demonstration.

First, install the whois package to get mkpasswd:

sudo apt install -y whois

Now, let's generate a bcrypt hash for the password "password123" with a cost factor of 10.

mkpasswd -m bcrypt -S $(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16) -s 10 password123

This command generates a bcrypt hash. The -S option provides a random salt, and -s 10 sets the cost factor to 10. The output will be a bcrypt hash string.

$2a$10$<random_salt_string><hash_string>

Now, let's create a file named kdf_hashes.txt that John the Ripper can read. We'll include a bcrypt hash and a manually crafted PBKDF2 hash.

nano kdf_hashes.txt

Add the following content to the file. Replace <YOUR_GENERATED_BCRYPT_HASH> with the actual bcrypt hash you generated in the previous step.

user1:$2a$10$<YOUR_GENERATED_BCRYPT_HASH>
user2:$pbkdf2-sha256$100000$c0ffee$a1b2c3d4e5f678901234567890abcdef1234567890abcdef1234567890abcdef

Note: The PBKDF2 hash for user2 is a placeholder for demonstration purposes. It's not a real hash of a known password, but it follows the format John the Ripper expects for PBKDF2-SHA256. The format is $pbkdf2-sha256$<iterations>$<salt_hex>$<hash_hex>.

Save the file by pressing Ctrl+X, then Y, then Enter.

Now, let's use John the Ripper to identify the hash types in kdf_hashes.txt:

john --format=raw-md5 --show kdf_hashes.txt

Note: We use --format=raw-md5 as a dummy format because John requires a format to be specified for --show, even if it's just identifying. John will automatically detect the actual KDF formats.

The output will show John identifying the hash types:

0 password hashes cracked, 2 left

John correctly identifies that there are two hashes, and it will recognize their KDF types when it attempts to crack them. This step primarily focuses on recognizing the hash formats.

Observe John the Ripper's Support for KDFs

In this step, you will observe how John the Ripper supports cracking hashes generated by KDFs. John the Ripper has built-in support for a wide range of hash types, including various KDFs like PBKDF2, bcrypt, and scrypt. When you provide John with a file containing these hashes, it automatically detects the hash type and applies the appropriate cracking algorithms.

We will use the kdf_hashes.txt file created in the previous step. We will attempt to crack the user1 hash (bcrypt) with a simple wordlist.

First, let's create a small wordlist file named wordlist.txt containing common passwords, including "password123".

nano wordlist.txt

Add the following content to wordlist.txt:

test
123456
password
password123
qwerty

Save the file by pressing Ctrl+X, then Y, then Enter.

Now, let's use John the Ripper to crack the hashes in kdf_hashes.txt using our wordlist.txt.

john kdf_hashes.txt --wordlist=wordlist.txt

John will start the cracking process. Since "password123" is in our wordlist and the bcrypt hash for user1 was generated from it, John should quickly crack it.

Using default input encoding: UTF-8
Loaded 2 password hashes with 2 different salts to test (bcrypt [Blowfish])
Will run 4 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
password123      (user1)
1g 0:00:00:00 DONE (2023-10-27 08:30) 100% <random_speed>c/s <random_speed>p/s <random_speed>L/s <random_speed>PC/s user1
Session completed.

You can see that John successfully cracked user1 and identified the password as password123. The user2 hash (PBKDF2) was not cracked because its "password" was not in our wordlist (it was a placeholder hash).

To view the cracked passwords, you can use the --show option:

john --show kdf_hashes.txt

The output will display the cracked password for user1:

user1:password123

1 password hash cracked, 1 left

This demonstrates John the Ripper's ability to automatically detect and crack KDF hashes, highlighting the importance of using strong, unique passwords even with KDFs.

Understand the Computational Cost of KDFs

In this step, you will gain a deeper understanding of the computational cost associated with KDFs and why it's a crucial security feature. The primary goal of KDFs is to make password cracking computationally expensive, thereby increasing the time and resources an attacker needs to guess passwords. This cost is controlled by parameters like the number of iterations (for PBKDF2) or the cost factor (for bcrypt and scrypt).

Let's revisit our kdf_demo.py script and modify it to include bcrypt and scrypt hashing, and observe the time taken for each.

First, install the passlib library, which provides implementations for various KDFs:

pip install passlib

Now, modify kdf_demo.py to include bcrypt and scrypt hashing.

nano kdf_demo.py

Replace the existing content with the following code:

import time
from passlib.hash import pbkdf2_sha256, bcrypt, scrypt

password = "mysecretpassword"

print("--- PBKDF2-SHA256 ---")
start_time = time.time()
## Default iterations for pbkdf2_sha256 in passlib is 29000
pbkdf2_hash = pbkdf2_sha256.hash(password)
end_time = time.time()
print(f"Hash: {pbkdf2_hash}")
print(f"Time taken: {end_time - start_time:.4f} seconds")
print(f"Iterations: {pbkdf2_sha256.identify(pbkdf2_hash).get('rounds')}")
print("-" * 20)

print("--- bcrypt ---")
start_time = time.time()
## Default rounds for bcrypt in passlib is 12
bcrypt_hash = bcrypt.hash(password)
end_time = time.time()
print(f"Hash: {bcrypt_hash}")
print(f"Time taken: {end_time - start_time:.4f} seconds")
print(f"Cost factor (rounds): {bcrypt.identify(bcrypt_hash).get('rounds')}")
print("-" * 20)

print("--- scrypt ---")
start_time = time.time()
## Default parameters for scrypt in passlib are N=2^14, r=8, p=1
scrypt_hash = scrypt.hash(password)
end_time = time.time()
print(f"Hash: {scrypt_hash}")
print(f"Time taken: {end_time - start_time:.4f} seconds")
print(f"N (CPU/Memory cost): {scrypt.identify(scrypt_hash).get('N')}")
print(f"r (Block size): {scrypt.identify(scrypt_hash).get('r')}")
print(f"p (Parallelization): {scrypt.identify(scrypt_hash).get('p')}")
print("-" * 20)

Save the file by pressing Ctrl+X, then Y, then Enter.

Now, run the updated Python script:

python3 kdf_demo.py

Observe the output. You will see the hash generated by each KDF and the time it took to generate it.

--- PBKDF2-SHA256 ---
Hash: $pbkdf2-sha256$<iterations>$<salt>$<hash>
Time taken: <time> seconds
Iterations: <iterations_value>
--------------------
--- bcrypt ---
Hash: $2a$<cost_factor>$<salt_and_hash>
Time taken: <time> seconds
Cost factor (rounds): <cost_factor_value>
--------------------
--- scrypt ---
Hash: $scrypt$<N>$<r>$<p>$<salt_and_hash>
Time taken: <time> seconds
N (CPU/Memory cost): <N_value>
r (Block size): <r_value>
p (Parallelization): <p_value>
--------------------

You'll notice that even for a single hash, it takes a measurable amount of time (e.g., milliseconds). This is the intentional computational cost. If an attacker tries to crack a password by guessing millions or billions of passwords, this small delay per hash adds up significantly, making brute-force attacks impractical.

For example, if a KDF takes 0.1 seconds to hash a password, an attacker trying 1000 guesses per second would only be able to test 10 passwords per second. This drastically slows down the cracking process compared to simple, fast hashing algorithms.

The parameters (iterations, cost factor, N, r, p) can be adjusted to increase or decrease this computational cost. As computing power increases over time, these parameters should be increased to maintain the same level of security.

Implement KDFs for Secure Password Storage

In this step, you will learn how to implement KDFs for secure password storage in a practical scenario. The goal is to store password hashes, not the passwords themselves, and to use KDFs to make these hashes resistant to cracking. This involves generating a unique salt for each password, hashing the password with the chosen KDF and salt, and storing the resulting hash (which includes the salt and KDF parameters) in a database or file.

We will continue using Python and the passlib library to simulate a simple user registration and login system.

First, ensure you are in the ~/project directory.

cd ~/project

Now, create a new Python script named secure_auth.py:

nano secure_auth.py

Add the following Python code to the file. This script will allow you to register a new user with a password (which will be hashed using bcrypt) and then verify a login attempt.

from passlib.hash import bcrypt

## Simulate a user database
## In a real application, this would be a database (e.g., SQLite, PostgreSQL)
user_db = {}

def register_user(username, password):
    """Hashes the password using bcrypt and stores it."""
    if username in user_db:
        print(f"Error: User '{username}' already exists.")
        return False

    ## Hash the password using bcrypt. Passlib handles salt generation and cost factor.
    hashed_password = bcrypt.hash(password)
    user_db[username] = hashed_password
    print(f"User '{username}' registered successfully.")
    print(f"Stored hash: {hashed_password}")
    return True

def verify_login(username, password):
    """Verifies a password against the stored hash."""
    if username not in user_db:
        print(f"Login failed: User '{username}' not found.")
        return False

    stored_hash = user_db[username]

    ## Verify the password using bcrypt.verify().
    ## This function automatically extracts salt and cost from the hash.
    if bcrypt.verify(password, stored_hash):
        print(f"Login successful for user '{username}'.")
        return True
    else:
        print(f"Login failed: Incorrect password for user '{username}'.")
        return False

## --- Demonstration ---
print("--- Registering Users ---")
register_user("alice", "securepassword123")
register_user("bob", "anothersecret")
register_user("alice", "duplicateuser") ## Attempt to register existing user

print("\n--- Attempting Logins ---")
verify_login("alice", "securepassword123") ## Correct password
verify_login("bob", "wrongpassword")      ## Incorrect password
verify_login("charlie", "anypassword")    ## Non-existent user
verify_login("bob", "anothersecret")     ## Correct password

Save the file by pressing Ctrl+X, then Y, then Enter.

Now, run the secure_auth.py script:

python3 secure_auth.py

Observe the output. You will see the registration process, including the generated bcrypt hashes, and the results of the login attempts.

--- Registering Users ---
User 'alice' registered successfully.
Stored hash: $2a$<cost_factor>$<salt_and_hash>
User 'bob' registered successfully.
Stored hash: $2a$<cost_factor>$<salt_and_hash>
Error: User 'alice' already exists.

--- Attempting Logins ---
Login successful for user 'alice'.
Login failed: Incorrect password for user 'bob'.
Login failed: User 'charlie' not found.
Login successful for user 'bob'.

This script demonstrates the fundamental principles of secure password storage using KDFs:

  1. Hashing at Registration: When a user registers, their plaintext password is never stored. Instead, it's immediately hashed using a KDF (bcrypt in this case), and the resulting hash is stored. The bcrypt.hash() function automatically handles salt generation and applies the default cost factor.
  2. Verification at Login: When a user attempts to log in, their provided password is again hashed using the same KDF and parameters (salt and cost factor) extracted from the stored hash. The newly generated hash is then compared to the stored hash. If they match, the login is successful. The bcrypt.verify() function simplifies this process.

By using KDFs, even if an attacker gains access to your user_db (or a real database), they will only have access to the computationally expensive hashes, making it significantly harder and slower to recover the original passwords.

Summary

In this lab, you gained a comprehensive understanding of Key Derivation Functions (KDFs) and their critical role in secure password storage. You learned about popular KDFs like PBKDF2, bcrypt, and scrypt, recognizing their unique hash formats and the parameters that control their computational cost. You observed how John the Ripper, a powerful password cracking tool, supports these KDFs, highlighting the importance of strong passwords even when KDFs are used. Finally, you implemented a basic secure password storage system using Python and the passlib library, demonstrating the practical application of KDFs for hashing and verifying passwords. This lab reinforced the principle that KDFs are essential for making brute-force and dictionary attacks computationally infeasible, significantly enhancing the security of user credentials.