Scan for Specific File Extensions in Gobuster

Beginner
Practice Now

Introduction

Gobuster is a powerful tool used for directory and file brute-forcing on web servers. While it's commonly used to discover hidden directories, it can also be leveraged to find files with specific extensions. This capability is crucial in penetration testing and security assessments, as it helps identify potential entry points, sensitive files, or misconfigurations that might not be immediately obvious. For example, finding .php files might indicate a web application, while .bak or .old files could reveal backup copies of sensitive data.

In this lab, you will learn how to effectively use Gobuster's -x flag to target specific file extensions during your web enumeration process. You will start by identifying common and relevant file extensions, construct a basic Gobuster command, and then enhance it to include extension-specific scanning. Finally, you will execute the scan and analyze the results to understand how to interpret the output. This hands-on experience will equip you with a valuable skill for more targeted and efficient web reconnaissance.

Identify Target File Extensions (e.g., .php, .html)

In this step, you will learn to identify common and relevant file extensions that you might want to scan for during a web enumeration. The choice of extensions often depends on the target technology stack (e.g., .php for PHP applications, .aspx for ASP.NET, .jsp for Java applications) or common file types that might contain sensitive information (e.g., .txt, .bak, .zip, .sql).

For this lab, we will focus on a few common web-related extensions: .php, .html, and .txt. These are frequently encountered and serve as good examples for demonstrating Gobuster's capabilities.

You can list some common extensions to keep in mind for your scans. While you don't need to perform any command in this step, understanding the types of files you're looking for is the first crucial step in any targeted scan.

Consider the following common extensions:

  • .php: PHP scripts
  • .html, .htm: HTML pages
  • .txt: Text files, often containing notes or logs
  • .js: JavaScript files
  • .css: Cascading Style Sheets
  • .xml: XML files
  • .json: JSON data files
  • .bak, .old, .orig: Backup files
  • .zip, .tar.gz: Archive files
  • .sql: Database dumps

Knowing which extensions to target helps narrow down your search and makes your enumeration more efficient.

Construct the Base gobuster dir Command

In this step, you will construct the basic gobuster dir command. The dir mode in Gobuster is used for directory and file brute-forcing. Before adding file extension specific options, it's good practice to understand the fundamental command structure.

The essential components of a gobuster dir command are:

  • gobuster dir: Specifies the mode (directory/file brute-forcing).
  • -u <URL>: Specifies the target URL. For this lab, our target will be http://localhost:8000, which is serving files from /tmp/web_root.
  • -w <wordlist>: Specifies the wordlist to use for brute-forcing. A common wordlist for web enumeration is common.txt or directory-list-2.3-medium.txt. For simplicity and speed in this lab, we will use a small custom wordlist that includes index, about, notes, and admin.

Let's create a simple wordlist in your ~/project directory.

echo -e "index\nabout\nnotes\nadmin\nconfig" > ~/project/small_wordlist.txt

Now, let's construct the base command without extensions. This command will attempt to find directories or files matching the entries in small_wordlist.txt.

gobuster dir -u http://localhost:8000 -w ~/project/small_wordlist.txt

You will see output similar to this, showing the directories/files found without specific extensions:

...
/index (Status: 200)
/about (Status: 200)
/notes (Status: 200)
/admin (Status: 200)
/config (Status: 200)
...

This output shows that Gobuster found entries matching the wordlist, but it doesn't tell us their specific file types yet.

Add the -x Flag to Specify Extensions

In this step, you will learn how to use the -x flag in Gobuster to specify the file extensions you want to scan for. This is the core of finding files with specific types.

The -x flag takes a comma-separated list of extensions. For example, to scan for .php and .html files, you would use -x php,html. Gobuster will then append these extensions to each entry in your wordlist and attempt to find files like index.php, about.html, etc.

Let's modify the previous command to include the extensions .php, .html, and .txt.

gobuster dir -u http://localhost:8000 -w ~/project/small_wordlist.txt -x php,html,txt,bak

This command tells Gobuster to look for index.php, index.html, index.txt, index.bak, about.php, about.html, about.txt, about.bak, and so on, for each entry in your wordlist.

Execute the command and observe the output. You should now see results that explicitly include the file extensions.

...
/index.php (Status: 200)
/about.html (Status: 200)
/notes.txt (Status: 200)
/admin.php (Status: 200)
/config.bak (Status: 200)
...

Notice how the output now clearly shows the file names with their respective extensions, indicating that Gobuster successfully found these files.

Run the Scan Against the Target

In this step, you will execute the full Gobuster command with the specified extensions against our simulated target. This is the practical application of what you've learned.

We will use the command from the previous step, which includes the target URL, the custom wordlist, and the -x flag with .php, .html, .txt, and .bak extensions.

gobuster dir -u http://localhost:8000 -w ~/project/small_wordlist.txt -x php,html,txt,bak

When you run this command, Gobuster will start iterating through the wordlist, appending each specified extension, and making requests to the http://localhost:8000 server.

The output will show the files it discovers along with their HTTP status codes. A 200 OK status code indicates that the file was found successfully.

===============================================================
Gobuster v3.6
by OJ (https://github.com/OJ/gobuster)
===============================================================
[+] Url:                     http://localhost:8000
[+] Method:                  GET
[+] Threads:                 10
[+] Wordlist:                /home/labex/project/small_wordlist.txt
[+] Extensions:              php,html,txt,bak
[+] Timeout:                 10s
===============================================================
2024/01/01 12:00:00 Starting gobuster in directory enumeration mode
===============================================================
/index.php            (Status: 200)
/about.html           (Status: 200)
/notes.txt            (Status: 200)
/admin.php            (Status: 200)
/config.bak           (Status: 200)
===============================================================
2024/01/01 12:00:00 Finished
===============================================================

This output confirms that Gobuster successfully identified index.php, about.html, notes.txt, admin.php, and config.bak on the target server.

Review the Results for Files with Specific Extensions

In this final step, you will review and interpret the results obtained from your Gobuster scan. Understanding the output is crucial for identifying valuable information during a security assessment.

The output from the previous step clearly lists the files found along with their HTTP status codes.

===============================================================
Gobuster v3.6
by OJ (https://github.com/OJ/gobuster)
===============================================================
[+] Url:                     http://localhost:8000
[+] Method:                  GET
[+] Threads:                 10
[+] Wordlist:                /home/labex/project/small_wordlist.txt
[+] Extensions:              php,html,txt,bak
[+] Timeout:                 10s
===============================================================
2024/01/01 12:00:00 Starting gobuster in directory enumeration mode
===============================================================
/index.php            (Status: 200)
/about.html           (Status: 200)
/notes.txt            (Status: 200)
/admin.php            (Status: 200)
/config.bak           (Status: 200)
===============================================================
2024/01/01 12:00:00 Finished
===============================================================

Here's what each line means:

  • /index.php (Status: 200): Indicates that a file named index.php was found at the root of the web server, and the server responded with a 200 OK status, meaning the request was successful.
  • /about.html (Status: 200): Similarly, about.html was found.
  • /notes.txt (Status: 200): A text file named notes.txt was found. This could potentially contain sensitive information.
  • /admin.php (Status: 200): An admin.php file was found. This might be an administrative interface, which is often a target for further investigation.
  • /config.bak (Status: 200): A backup file named config.bak was found. Backup files often contain sensitive configuration details or source code.

By reviewing these results, you can identify files that might be interesting for further investigation. For example, you might try to access /admin.php in a web browser or download /config.bak to examine its contents.

This targeted scanning for extensions helps in discovering hidden or forgotten files that could lead to vulnerabilities or information disclosure.

Summary

In this lab, you have successfully learned how to use Gobuster to scan for specific file extensions on a target web server. You started by understanding the importance of identifying relevant file extensions, then constructed a basic gobuster dir command, and finally enhanced it using the -x flag to target .php, .html, .txt, and .bak files.

You executed the scan against a simulated web server and interpreted the results, identifying various files with their respective extensions and HTTP status codes. This skill is invaluable for web enumeration during penetration testing, allowing you to discover hidden files that might contain sensitive information, reveal the technology stack, or expose potential entry points.

By mastering the use of the -x flag, you can perform more targeted and efficient reconnaissance, significantly improving your ability to uncover vulnerabilities in web applications. Continue practicing with different wordlists and extension combinations to further hone your web enumeration skills.