The uniq (unique) command is an essential tool for text processing in Linux. It helps you filter and manage duplicate lines within a text file, but it's important to understand how it works to use it effectively.
Basic Duplicate Removal
The primary function of the uniq command is to remove duplicate adjacent lines. Imagine you have a file named reading.txt with the following content:
book
book
paper
paper
article
article
magazine
To remove the repeated lines, you can run the uniq command:
$ uniq reading.txt
book
paper
article
magazine
As you can see, uniq outputs a version of the file with the duplicate adjacent lines removed.
Advanced Filtering Options
The uniq command also provides several options for more detailed analysis.
To count the occurrences of each line, use the -c (count) flag:
$ uniq -c reading.txt
2 book
2 paper
2 article
1 magazine
To display only the lines that are not repeated (i.e., are unique), use the -u (unique) flag:
$ uniq -u reading.txt
magazine
Conversely, to display only the lines that are repeated, use the -d (duplicated) flag:
$ uniq -d reading.txt
book
paper
article
The Importance of Sorting
A critical detail about the uniq linux command is that it only detects duplicate lines if they are directly adjacent to each other. If the duplicates are scattered throughout the file, uniq will not identify them.
Consider this version of reading.txt where duplicates are not adjacent:
book
paper
book
paper
article
magazine
article
Running uniq on this file will produce a surprising result:
$ uniq reading.txt
book
paper
book
paper
article
magazine
article
No lines were removed because no two identical lines were next to each other. To solve this, you must first sort the file's contents. By piping the output of sort into uniq, you ensure that all identical lines become adjacent, allowing uniq to work correctly. This combination is a powerful and common pattern in shell scripting.
$ sort reading.txt | uniq
article
book
magazine
paper
This command first sorts the lines alphabetically, then uniq filters out the duplicates, giving you a clean list of unique entries.