How to Check If a Collection Contains Duplicates in Java

JavaJavaBeginner
Practice Now

Introduction

In this lab, you will learn how to efficiently check for duplicate elements within a Java collection. We will explore the use of HashSet, a powerful tool from the Java Collections Framework, to identify duplicates.

Through hands-on steps, you will first learn how to leverage the unique element property of HashSet to detect duplicates. Then, you will discover an alternative method by comparing the sizes of the original collection and a HashSet created from it. Finally, we will examine how to handle null elements when checking for duplicates.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL java(("Java")) -.-> java/DataStructuresGroup(["Data Structures"]) java(("Java")) -.-> java/ObjectOrientedandAdvancedConceptsGroup(["Object-Oriented and Advanced Concepts"]) java/DataStructuresGroup -.-> java/arrays("Arrays") java/DataStructuresGroup -.-> java/collections_methods("Collections Methods") java/ObjectOrientedandAdvancedConceptsGroup -.-> java/arraylist("ArrayList") java/ObjectOrientedandAdvancedConceptsGroup -.-> java/hashset("HashSet") subgraph Lab Skills java/arrays -.-> lab-559941{{"How to Check If a Collection Contains Duplicates in Java"}} java/collections_methods -.-> lab-559941{{"How to Check If a Collection Contains Duplicates in Java"}} java/arraylist -.-> lab-559941{{"How to Check If a Collection Contains Duplicates in Java"}} java/hashset -.-> lab-559941{{"How to Check If a Collection Contains Duplicates in Java"}} end

Use HashSet for Duplicate Check

In this step, we will explore how to use a HashSet in Java to efficiently check for duplicate elements within a collection. HashSet is part of the Java Collections Framework and is particularly useful because it stores unique elements and provides very fast lookups.

First, let's create a new Java file named DuplicateCheck.java in your ~/project directory. You can do this directly in the WebIDE File Explorer by right-clicking in the file list area and selecting "New File", then typing DuplicateCheck.java.

Now, open the DuplicateCheck.java file in the Code Editor and add the following code:

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class DuplicateCheck {

    public static void main(String[] args) {
        // Create a list with some duplicate elements
        List<String> names = new ArrayList<>();
        names.add("Alice");
        names.add("Bob");
        names.add("Alice"); // Duplicate
        names.add("Charlie");
        names.add("Bob"); // Duplicate

        System.out.println("Original List: " + names);

        // Use a HashSet to find duplicates
        Set<String> uniqueNames = new HashSet<>();
        Set<String> duplicates = new HashSet<>();

        for (String name : names) {
            if (!uniqueNames.add(name)) {
                // If add returns false, the element is already in the set
                duplicates.add(name);
            }
        }

        System.out.println("Duplicates found: " + duplicates);
    }
}

Let's break down the new parts of this code:

  • import java.util.ArrayList;, import java.util.HashSet;, import java.util.List;, import java.util.Set;: These lines import the necessary classes from the Java utility library to work with lists and sets.
  • List<String> names = new ArrayList<>();: This creates a List called names that can hold String objects. We use ArrayList as a specific implementation of the List interface.
  • names.add(...): This adds elements to our names list. Notice that "Alice" and "Bob" are added twice.
  • Set<String> uniqueNames = new HashSet<>();: This creates a Set called uniqueNames using the HashSet implementation. A Set guarantees that it will only contain unique elements.
  • Set<String> duplicates = new HashSet<>();: This creates another Set to store the duplicate elements we find.
  • for (String name : names): This is a for-each loop that iterates through each name in the names list.
  • if (!uniqueNames.add(name)): The add() method of a HashSet returns true if the element was successfully added (meaning it was not already in the set), and false if the element was already present. The ! negates the result, so the code inside the if block runs only when add() returns false, indicating a duplicate.
  • duplicates.add(name);: If a duplicate is found, we add it to our duplicates set.

Save the DuplicateCheck.java file (Ctrl+S or Cmd+S).

Now, open the Terminal at the bottom of the WebIDE. Make sure you are in the ~/project directory. You can confirm this by typing pwd and pressing Enter. The output should be /home/labex/project.

Compile the Java program using the javac command:

javac DuplicateCheck.java

If there are no errors, you should see no output. This means the compilation was successful and a DuplicateCheck.class file has been created in the ~/project directory. You can verify this by running the ls command.

Finally, run the compiled Java program using the java command:

java DuplicateCheck

You should see output similar to this:

Original List: [Alice, Bob, Alice, Charlie, Bob]
Duplicates found: [Alice, Bob]

The order of elements in the Duplicates found output might vary because HashSet does not maintain insertion order.

You have successfully used a HashSet to identify duplicate elements in a list!

Compare Collection and Set Sizes

In the previous step, we used a HashSet to find duplicate elements. A key characteristic of a Set is that it only stores unique elements. This means that if you add duplicate elements to a Set, only one instance of each element will be kept. This property is very useful for tasks like removing duplicates from a list.

In this step, we will modify our DuplicateCheck.java program to demonstrate this property by comparing the size of the original list (which can contain duplicates) with the size of a HashSet created from that list (which will only contain unique elements).

Open the DuplicateCheck.java file in the WebIDE Code Editor.

Modify the main method to look like this:

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class DuplicateCheck {

    public static void main(String[] args) {
        // Create a list with some duplicate elements
        List<String> names = new ArrayList<>();
        names.add("Alice");
        names.add("Bob");
        names.add("Alice"); // Duplicate
        names.add("Charlie");
        names.add("Bob"); // Duplicate
        names.add("David");

        System.out.println("Original List: " + names);
        System.out.println("Size of Original List: " + names.size());

        // Create a HashSet from the list
        Set<String> uniqueNamesSet = new HashSet<>(names);

        System.out.println("Set created from List: " + uniqueNamesSet);
        System.out.println("Size of Set: " + uniqueNamesSet.size());

        // The difference in size tells us how many duplicates were removed
        int duplicatesCount = names.size() - uniqueNamesSet.size();
        System.out.println("Number of duplicates (excluding first occurrence): " + duplicatesCount);
    }
}

Here's what we've added or changed:

  • We added another name, "David", to the names list to have a slightly larger list.
  • System.out.println("Size of Original List: " + names.size());: We print the size of the original list using the size() method.
  • Set<String> uniqueNamesSet = new HashSet<>(names);: This is a convenient way to create a HashSet directly from another Collection (like our ArrayList). When you do this, the HashSet automatically adds all elements from the list, and because it's a Set, it will discard any duplicates.
  • System.out.println("Size of Set: " + uniqueNamesSet.size());: We print the size of the HashSet. This size represents the number of unique elements.
  • int duplicatesCount = names.size() - uniqueNamesSet.size();: We calculate the difference between the list size and the set size. This difference tells us how many elements were duplicates (beyond their first appearance).

Save the modified DuplicateCheck.java file.

Now, compile the program again in the Terminal:

javac DuplicateCheck.java

If the compilation is successful, run the program:

java DuplicateCheck

You should see output similar to this:

Original List: [Alice, Bob, Alice, Charlie, Bob, David]
Size of Original List: 6
Set created from List: [Alice, Bob, Charlie, David]
Size of Set: 4
Number of duplicates (excluding first occurrence): 2

Notice that the size of the original list is 6, but the size of the HashSet created from it is 4. The difference (6 - 4 = 2) correctly indicates that there were two duplicate names ("Alice" and "Bob" each appeared once after their first occurrence).

This demonstrates how easily you can use a HashSet to find the number of unique elements or the number of duplicates in a collection.

Test with Null Elements

In the previous steps, we've seen how HashSet handles duplicate non-null elements. Now, let's explore how HashSet behaves when we try to add null elements. Understanding how collections handle null is important because it can sometimes lead to unexpected behavior or errors if not handled carefully.

A HashSet allows one null element. If you try to add null multiple times, only the first null will be stored.

Let's modify our DuplicateCheck.java program one more time to test this.

Open the DuplicateCheck.java file in the WebIDE Code Editor.

Modify the main method to include null values in the list:

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class DuplicateCheck {

    public static void main(String[] args) {
        // Create a list with some duplicate and null elements
        List<String> names = new ArrayList<>();
        names.add("Alice");
        names.add("Bob");
        names.add(null); // Add null
        names.add("Alice"); // Duplicate
        names.add("Charlie");
        names.add(null); // Add null again
        names.add("Bob"); // Duplicate
        names.add("David");
        names.add(null); // Add null a third time

        System.out.println("Original List: " + names);
        System.out.println("Size of Original List: " + names.size());

        // Create a HashSet from the list
        Set<String> uniqueNamesSet = new HashSet<>(names);

        System.out.println("Set created from List: " + uniqueNamesSet);
        System.out.println("Size of Set: " + uniqueNamesSet.size());

        // The difference in size tells us how many duplicates were removed
        // Note: This calculation is less straightforward with nulls and duplicates combined
        // int duplicatesCount = names.size() - uniqueNamesSet.size();
        // System.out.println("Number of duplicates (excluding first occurrence): " + duplicatesCount);
    }
}

Here, we've added null to the names list multiple times. We've also commented out the duplicate count calculation because it becomes less meaningful when null is involved and we are focusing on the set's behavior with null.

Save the modified DuplicateCheck.java file.

Compile the program in the Terminal:

javac DuplicateCheck.java

If the compilation is successful, run the program:

java DuplicateCheck

You should see output similar to this:

Original List: [Alice, Bob, null, Alice, Charlie, null, Bob, David, null]
Size of Original List: 9
Set created from List: [null, Alice, Bob, Charlie, David]
Size of Set: 5

Observe the output:

  • The Original List shows all the elements, including the multiple null values. Its size is 9.
  • The Set created from List contains only unique elements. Notice that null appears only once in the set, even though it was added multiple times to the list. The size of the set is 5 (Alice, Bob, Charlie, David, and null).

This confirms that HashSet allows one null element and treats subsequent null additions as duplicates, just like any other element.

You have now successfully tested how HashSet handles null elements. This concludes our exploration of using HashSet for duplicate checking and understanding its behavior with unique and null elements.

Summary

In this lab, we learned how to efficiently check for duplicate elements within a Java collection using a HashSet. We created a List with duplicate elements and then iterated through it, attempting to add each element to a HashSet. By checking the return value of the add() method, which returns false if the element is already present, we were able to identify and collect the duplicate elements into a separate HashSet. This method leverages the unique element property and fast lookup capabilities of HashSet for effective duplicate detection.