The sort and uniq commands work together to filter out duplicate lines from a text file. Here's how they function in combination:
-
Sorting: The
sortcommand arranges the lines of a file in a specified order (alphabetically by default). This is important becauseuniqonly removes adjacent duplicate lines. Therefore, sorting the file first ensures that all duplicates are next to each other. -
Removing Duplicates: The
uniqcommand then processes the sorted output and removes any duplicate lines, leaving only unique entries.
Example Usage
Suppose you have a file named data.txt with the following content:
apple
banana
apple
orange
banana
kiwi
You can use the following command to sort the file and remove duplicates:
sort data.txt | uniq
Output
The output will be:
apple
banana
kiwi
orange
Counting Occurrences
You can also count how many times each unique line appears by using the -c option with uniq:
sort data.txt | uniq -c
Output
The output will show the count of each unique line:
2 apple
2 banana
1 kiwi
1 orange
This combination is powerful for data processing tasks, allowing you to efficiently manage and analyze text data.
