Create and Manage Custom Wordlists

Kali LinuxBeginner
Practice Now

Introduction

In this lab, you will explore the fundamental techniques for creating and managing custom wordlists using standard Linux command-line utilities. Wordlists are crucial in various cybersecurity and data processing tasks, from brute-forcing passwords to fuzzing applications and analyzing text data. You will learn how to generate simple wordlists, combine multiple lists, remove duplicate entries, sort them alphabetically, and filter them based on length. By the end of this lab, you will have a solid understanding of how to manipulate text files efficiently to create tailored wordlists for your specific needs.

Generate a Simple Custom Wordlist

In this step, you will learn how to create a basic custom wordlist using the echo command and redirection. This is the simplest way to generate a file containing a few words, each on a new line.

First, navigate to your project directory if you are not already there:

cd ~/project

Now, create a simple wordlist named my_wordlist.txt with a few words:

echo -e "apple\nbanana\norange\ngrape" > my_wordlist.txt

The -e option enables the interpretation of backslash escapes, and \n creates a new line. The > redirects the output to the specified file, creating it if it doesn't exist or overwriting it if it does.

To verify the content of your newly created wordlist, use the cat command:

cat my_wordlist.txt

You should see the following output:

apple
banana
orange
grape

Next, let's add more words to the same wordlist without overwriting its existing content. We will use the >> operator for appending.

echo -e "kiwi\nstrawberry\nblueberry" >> my_wordlist.txt

Verify the updated content:

cat my_wordlist.txt

The output should now include the newly added words:

apple
banana
orange
grape
kiwi
strawberry
blueberry

This method is useful for quickly generating small, custom wordlists or adding entries to existing ones.

Combine Multiple Wordlists

In this step, you will learn how to combine the contents of multiple wordlists into a single, consolidated wordlist. This is a common task when you have different sources of words that you want to merge.

First, let's create another small wordlist named additional_words.txt:

echo -e "melon\npeach\nplum" > additional_words.txt

Verify its content:

cat additional_words.txt

You should see:

melon
peach
plum

Now, we will combine my_wordlist.txt and additional_words.txt into a new file called combined_wordlist.txt. We will use the cat command to concatenate the files and redirect the output.

cat my_wordlist.txt additional_words.txt > combined_wordlist.txt

Inspect the content of the combined_wordlist.txt:

cat combined_wordlist.txt

The output will show all words from both files, in the order they were concatenated:

apple
banana
orange
grape
kiwi
strawberry
blueberry
melon
peach
plum

This technique is very flexible and can be used to combine any number of wordlists.

Remove Duplicates from a Wordlist

Wordlists often contain duplicate entries, especially after combining multiple sources. In this step, you will learn how to remove these duplicates using the sort and uniq commands. The uniq command only detects adjacent duplicate lines, so it's crucial to sort the file first.

Let's intentionally add some duplicate entries to our combined_wordlist.txt to demonstrate this.

echo -e "apple\nbanana\nmelon" >> combined_wordlist.txt

Now, view the content of combined_wordlist.txt to see the duplicates:

cat combined_wordlist.txt

You will notice apple, banana, and melon appearing multiple times.

To remove duplicates, we first sort the file and then pipe the output to uniq. We'll save the result to a new file, unique_wordlist.txt.

sort combined_wordlist.txt | uniq > unique_wordlist.txt

Now, inspect the unique_wordlist.txt:

cat unique_wordlist.txt

The output should now contain only unique entries, sorted alphabetically:

apple
banana
blueberry
grape
kiwi
melon
orange
peach
plum
strawberry

This is a powerful combination of commands for cleaning up wordlists.

Sort a Wordlist Alphabetically

Sorting a wordlist alphabetically is often useful for organization, easier readability, and as a prerequisite for other operations like removing duplicates (as seen in the previous step). In this step, you will explicitly sort a wordlist.

We will use the sort command on our unique_wordlist.txt and save the sorted output to sorted_wordlist.txt. Although unique_wordlist.txt is already sorted, this step demonstrates the sort command independently.

sort unique_wordlist.txt > sorted_wordlist.txt

Now, view the content of sorted_wordlist.txt:

cat sorted_wordlist.txt

The output will be the words in alphabetical order:

apple
banana
blueberry
grape
kiwi
melon
orange
peach
plum
strawberry

The sort command has many options, such as -r for reverse alphabetical order, or -n for numerical sorting. For example, to sort in reverse order:

sort -r unique_wordlist.txt

This would output:

strawberry
plum
peach
orange
melon
kiwi
grape
blueberry
banana
apple

For this lab, we will stick to the default alphabetical sort.

Filter a Wordlist by Length

Sometimes you need to filter a wordlist based on the length of the words. For example, you might only want words between 5 and 8 characters long. In this step, you will use the awk command to filter words by their length.

We will filter sorted_wordlist.txt to include only words that have a length between 5 and 7 characters (inclusive). The length($0) function in awk returns the length of the current line (word).

awk 'length($0) >= 5 && length($0) <= 7' sorted_wordlist.txt > filtered_wordlist.txt

Now, inspect the content of filtered_wordlist.txt:

cat filtered_wordlist.txt

The output should contain only words meeting the length criteria:

apple
banana
orange
grape
melon
peach

Let's break down the awk command:

  • awk: The command-line utility for text processing.
  • 'length($0) >= 5 && length($0) <= 7': This is the awk program.
    • length($0): Returns the length of the entire line ($0 refers to the whole line).
    • >= 5: Checks if the length is greater than or equal to 5.
    • &&: Logical AND operator.
    • <= 7: Checks if the length is less than or equal to 7.
    • If the condition is true, awk prints the line by default.
  • sorted_wordlist.txt: The input file.
  • > filtered_wordlist.txt: Redirects the output to a new file.

This filtering capability is very powerful for refining wordlists for specific purposes.

Summary

In this lab, you have successfully learned how to create and manage custom wordlists using various essential Linux command-line tools. You started by generating simple wordlists with echo and redirection, then combined multiple lists using cat. You mastered the crucial technique of removing duplicate entries by combining sort and uniq, and practiced sorting wordlists alphabetically. Finally, you used awk to filter wordlists based on specific length criteria. These skills are fundamental for anyone working with text data, especially in cybersecurity for tasks like password cracking, fuzzing, and data analysis. You now have a solid foundation for manipulating and refining wordlists to suit diverse requirements.