Use John the Ripper to Crack Office Documents (DOCX, XLSX, PPTX)

Kali LinuxBeginner
Practice Now

Introduction

In this lab, you will explore the process of cracking password-protected Microsoft Office documents (DOCX, XLSX, PPTX) using the powerful password cracking tool, John the Ripper. We will use office2john.py to extract the password hash from the Office files and then feed this hash to John the Ripper for cracking. This lab will demonstrate the steps involved in a practical scenario, helping you understand the security implications of weak passwords on Office documents.

Create a Password-Protected DOCX File

In this step, you will create a simple DOCX file and protect it with a password. This file will then be used in subsequent steps to extract and crack its password hash.

First, let's install libreoffice-writer to create the DOCX file.

sudo apt install -y libreoffice-writer

Once installed, open LibreOffice Writer.

libreoffice --writer &

A new LibreOffice Writer window will open.

  1. Type some text, for example, "This is a test document."
  2. Go to File -> Save As....
  3. In the Save As dialog, navigate to /home/labex/project/.
  4. Enter secret.docx as the file name.
  5. Check the Save with password box.
  6. Click Save.
  7. In the Set Password dialog, enter password123 as the password in both Enter password and Confirm password fields.
  8. Click OK.
  9. Close LibreOffice Writer.

Verify that the file secret.docx exists in your ~/project directory.

ls -l ~/project/secret.docx

-rw-r--r-- 1 labex labex XXXX Month XX XX:XX /home/labex/project/secret.docx

Extract Hash from DOCX using office2john

In this step, you will use the office2john.py script, which is part of the John the Ripper suite, to extract the password hash from the secret.docx file you created. This hash is what John the Ripper will attempt to crack.

First, locate the office2john.py script. It's usually found in the /usr/share/john/ directory.

find /usr/share/john/ -name office2john.py

/usr/share/john/office2john.py

Now, use office2john.py to extract the hash from secret.docx and save it to a file named hash.txt.

python3 /usr/share/john/office2john.py ~/project/secret.docx > ~/project/hash.txt

Display the content of hash.txt to see the extracted hash.

cat ~/project/hash.txt

secret.docx:$office$*2007*100000*256*16*XXXXXXXXXXXXXXX*XXXXXXXXXXXXXXX*XXXXXXXXXXXXXXX*XXXXXXXXXXXXXXX

The output will show a long string starting with secret.docx:$office$..., which is the hash of your password.

Crack DOCX Hash with John the Ripper

Now that you have extracted the hash, you will use John the Ripper to crack it. We will use a simple wordlist for this demonstration, as our password is "password123", which is a common word.

John the Ripper comes with a default wordlist located at /usr/share/john/password.lst. Let's use this wordlist.

john --wordlist=/usr/share/john/password.lst ~/project/hash.txt

John the Ripper will start processing the hash. If the password is in the wordlist, it will quickly find it.


Using default input encoding: UTF-8
Loaded 1 password hash (Office, 2007/2010/2013/2016 [MD5/SHA1/SHA256/SHA512 RC4/AES])
Will run till all hashes are cracked, by default
Press 'q' or Ctrl-C to abort, almost any other key for status
password123      (secret.docx)
1g 0:00:00:00 DONE (20XX-XX-XX XX:XX) 100.0g/s 100.0p/s 100.0c/s 100.0C/s password123
Session completed.

After cracking, you can view the cracked passwords using the --show option.

john --show ~/project/hash.txt

secret.docx:password123

1 password hash cracked, 0 left

This output confirms that John the Ripper successfully cracked the password for secret.docx as password123.

Repeat for XLSX and PPTX Files

In this step, you will apply the same process to XLSX (Excel) and PPTX (PowerPoint) files to demonstrate that office2john.py and John the Ripper work similarly across different Office document types.

First, install libreoffice-calc and libreoffice-impress.

sudo apt install -y libreoffice-calc libreoffice-impress

Create a Password-Protected XLSX File:

  1. Open LibreOffice Calc:
    libreoffice --calc &
    
  2. Type some text, e.g., "Spreadsheet data."
  3. Go to File -> Save As....
  4. Navigate to /home/labex/project/.
  5. Enter secret.xlsx as the file name.
  6. Check Save with password.
  7. Click Save.
  8. Set the password to password123 and confirm.
  9. Click OK and close LibreOffice Calc.

Extract Hash from XLSX:

python3 /usr/share/john/office2john.py ~/project/secret.xlsx >> ~/project/hash.txt

Create a Password-Protected PPTX File:

  1. Open LibreOffice Impress:
    libreoffice --impress &
    
  2. Add a title, e.g., "Presentation Title."
  3. Go to File -> Save As....
  4. Navigate to /home/labex/project/.
  5. Enter secret.pptx as the file name.
  6. Check Save with password.
  7. Click Save.
  8. Set the password to password123 and confirm.
  9. Click OK and close LibreOffice Impress.

Extract Hash from PPTX:

python3 /usr/share/john/office2john.py ~/project/secret.pptx >> ~/project/hash.txt

Crack All Hashes: Now, run John the Ripper on the updated hash.txt file, which now contains hashes for DOCX, XLSX, and PPTX.

john --wordlist=/usr/share/john/password.lst ~/project/hash.txt

Using default input encoding: UTF-8
Loaded 3 password hashes (Office, 2007/2010/2013/2016 [MD5/SHA1/SHA256/SHA512 RC4/AES])
Will run till all hashes are cracked, by default
Press 'q' or Ctrl-C to abort, almost any other key for status
password123      (secret.xlsx)
password123      (secret.pptx)
password123      (secret.docx)
3g 0:00:00:00 DONE (20XX-XX-XX XX:XX) 100.0g/s 100.0p/s 100.0c/s 100.0C/s password123
Session completed.

Verify all cracked passwords:

john --show ~/project/hash.txt

secret.xlsx:password123
secret.pptx:password123
secret.docx:password123

3 password hashes cracked, 0 left

Understand Office Document Encryption

In this step, we will briefly discuss the encryption mechanisms used by Microsoft Office documents and why tools like John the Ripper are effective.

Modern Microsoft Office documents (DOCX, XLSX, PPTX) use XML-based formats (Open XML). When a password is set, the document's content is encrypted. The encryption process involves deriving an encryption key from the user's password using a key derivation function (KDF) like PBKDF2 (Password-Based Key Derivation Function 2).

office2john.py works by extracting the necessary parameters from the Office document's XML structure, such as the salt, iteration count, and the encrypted verifier hash. These parameters, along with the hash type (e.g., Office 2007/2010/2013/2016), form the "hash" string that John the Ripper understands.

John the Ripper then performs a brute-force or dictionary attack. For each word in its wordlist (or each combination in a brute-force attack), it applies the same KDF with the extracted parameters to generate a candidate key. If this candidate key matches the encrypted verifier hash, the password is found.

The strength of the encryption depends heavily on the password's complexity and length. Weak or common passwords, like "password123", are easily cracked using dictionary attacks, as demonstrated in this lab. Strong passwords, which are long, random, and contain a mix of characters, significantly increase the time and computational resources required for cracking, making them much more secure.

This exercise highlights the importance of using strong, unique passwords for sensitive documents to protect them from unauthorized access.

Summary

In this lab, you successfully learned how to extract password hashes from password-protected Microsoft Office documents (DOCX, XLSX, PPTX) using office2john.py. You then used John the Ripper to crack these extracted hashes, demonstrating the vulnerability of documents protected with weak passwords. This hands-on experience provided insight into the process of password cracking and reinforced the importance of using strong, complex passwords for securing your digital assets.