Use Different Wordlists for Directory Scanning in Gobuster

Beginner
Practice Now

Introduction

In this lab, you will explore the crucial role of wordlists in directory scanning using Gobuster, a popular tool for web enumeration. Directory scanning is a fundamental step in web application penetration testing, helping to discover hidden directories and files that might contain sensitive information or provide attack vectors. The effectiveness and efficiency of this process heavily depend on the wordlist used. You will learn how to locate default wordlists in Kali Linux, perform scans with different wordlist sizes, and analyze the impact on scan time and the number of discovered entries. This will provide a practical understanding of the trade-offs involved in choosing the right wordlist for your enumeration tasks.

Locate the Default Wordlists in Kali

In this step, you will locate the default wordlists provided in Kali Linux, which are essential for tools like Gobuster. Kali Linux comes with a rich collection of wordlists stored in the seclists directory. These wordlists are categorized for various purposes, including directory enumeration, password cracking, and fuzzing.

First, navigate to the seclists directory. The common path for these wordlists is /usr/share/seclists.

ls -l /usr/share/seclists

You will see various subdirectories. For web enumeration, the Discovery directory is particularly relevant. Inside Discovery, you'll find Web-Content, which contains wordlists specifically designed for finding web directories and files.

ls -l /usr/share/seclists/Discovery/Web-Content/

Among these, common.txt is a relatively small wordlist, and directory-list-2.3-medium.txt is a larger, more comprehensive one. You will use these two wordlists in the subsequent steps to observe the impact of wordlist size on Gobuster scans.

ls -l /usr/share/seclists/Discovery/Web-Content/common.txt
ls -l /usr/share/seclists/Discovery/Web-Content/directory-list-2.3-medium.txt

You can also view the first few lines of common.txt to get an idea of its content:

head /usr/share/seclists/Discovery/Web-Content/common.txt

Expected output (truncated):

.git
.svn
.DS_Store
.htaccess
.htpasswd
.bash_history
.bash_logout
.bashrc
.profile
.ssh

Perform a Scan with a Small Wordlist (common.txt)

In this step, you will perform a directory scan using Gobuster with the smaller common.txt wordlist. This will demonstrate a quick scan that might miss less common directories but is faster.

The target for our scan will be a simple Python web server running locally on port 8000. The URL will be http://127.0.0.1:8000.

Use the following command to run Gobuster:

  • dir: Specifies that we are performing a directory/file brute-forcing.
  • -u: Specifies the target URL.
  • -w: Specifies the path to the wordlist.
  • -o: Specifies an output file to save the results.
gobuster dir -u http://127.0.0.1:8000 -w /usr/share/seclists/Discovery/Web-Content/common.txt -o ~/project/gobuster_common_results.txt

The scan will run and display its progress. Once completed, you can view the results saved in ~/project/gobuster_common_results.txt.

cat ~/project/gobuster_common_results.txt

Expected output (may vary slightly based on common.txt content and server setup):

/admin                (Status: 200)
/backup               (Status: 200)
/common.html          (Status: 200)

Note the time it took for the scan to complete. This will be compared with the scan using a larger wordlist in the next step.

Perform a Scan with a Larger Wordlist (directory-list-2.3-medium.txt)

In this step, you will repeat the directory scan using Gobuster, but this time with the much larger directory-list-2.3-medium.txt wordlist. This will demonstrate how a more comprehensive wordlist can find more entries but at the cost of increased scan time.

gobuster dir -u http://127.0.0.1:8000 -w /usr/share/seclists/Discovery/Web-Content/directory-list-2.3-medium.txt -o ~/project/gobuster_medium_results.txt

This scan will take significantly longer than the previous one due to the size of the wordlist. Be patient while it runs. Once it completes, examine the results.

cat ~/project/gobuster_medium_results.txt

Expected output (will include more entries than common.txt scan, including /admin, /backup, /common.html and potentially many others):

/admin                (Status: 200)
/backup               (Status: 200)
/common.html          (Status: 200)
/css                  (Status: 200)
/js                   (Status: 200)
/images               (Status: 200)
... (many more entries)

Observe the difference in the number of discovered entries and the total time taken compared to the previous scan.

Compare the Time and Results of Both Scans

In this step, you will explicitly compare the results and the approximate time taken for both scans. While Gobuster doesn't provide a precise time taken in its output, you can infer it from the duration you observed during execution and by checking the file sizes or line counts of the output files.

First, let's compare the number of lines (which corresponds to the number of discovered entries) in both output files:

echo "Results from common.txt scan:"
wc -l ~/project/gobuster_common_results.txt

echo "Results from directory-list-2.3-medium.txt scan:"
wc -l ~/project/gobuster_medium_results.txt

Expected output (line counts will vary):

Results from common.txt scan:
3 /home/labex/project/gobuster_common_results.txt
Results from directory-list-2.3-medium.txt scan:
X /home/labex/project/gobuster_medium_results.txt (where X is a much larger number)

You should observe that the scan with directory-list-2.3-medium.txt found significantly more entries.

Regarding time, you would have noticed that the scan with common.txt completed very quickly (likely a few seconds), while the scan with directory-list-2.3-medium.txt took much longer (potentially minutes, depending on system resources and network speed). This highlights the direct relationship between wordlist size and scan duration.

Understand the Trade-off Between Wordlist Size and Scan Time

In this final step, you will summarize the key takeaways from the previous scans, focusing on the trade-off between wordlist size, scan time, and the comprehensiveness of the results.

Larger Wordlists:

  • Pros: More likely to discover hidden directories and files, leading to a more thorough enumeration. This is crucial for finding obscure or non-standard paths that might contain vulnerabilities or sensitive data.
  • Cons: Significantly increases scan time, consumes more system resources, and generates a larger volume of output, which can be challenging to analyze. It might also generate more noise (false positives or irrelevant entries).

Smaller Wordlists:

  • Pros: Much faster scan times, less resource intensive, and produces more manageable output. Ideal for quick checks or when you have a good idea of common paths.
  • Cons: May miss less common or custom-named directories and files, leading to an incomplete enumeration and potentially overlooking critical information.

Choosing the Right Wordlist:
The choice of wordlist depends on your objective and the time available.

  • For initial reconnaissance or quick checks, a smaller, targeted wordlist like common.txt is often sufficient.
  • For a comprehensive and in-depth assessment, a larger wordlist like directory-list-2.3-medium.txt or even larger ones (e.g., directory-list-2.3-big.txt) is necessary.
  • Sometimes, a combination approach is best: start with a small wordlist for speed, then follow up with a larger one if initial findings warrant a deeper dive.
  • Custom wordlists tailored to the specific target or technology stack can also be highly effective.

This understanding is vital for efficient and effective web application penetration testing.

Summary

In this lab, you successfully explored the impact of different wordlist sizes on Gobuster directory scanning. You learned how to locate default wordlists in Kali Linux, performed scans using both a small (common.txt) and a larger (directory-list-2.3-medium.txt) wordlist, and compared their results and approximate scan times. You gained a practical understanding of the trade-offs involved: larger wordlists offer more comprehensive results but require significantly more time, while smaller wordlists provide quicker scans at the risk of missing less common entries. This knowledge is crucial for making informed decisions during web enumeration tasks in penetration testing.