How do `sort` and `uniq` work together?

The sort and uniq commands work together to filter out duplicate lines from a text file. Here's how they function in combination:

Sorting: The sort command arranges the lines of a file in a specified order (alphabetically by default). This is important because uniq only removes adjacent duplicate lines. Therefore, sorting the file first ensures that all duplicates are next to each other.
Removing Duplicates: The uniq command then processes the sorted output and removes any duplicate lines, leaving only unique entries.

Suppose you have a file named data.txt with the following content:

apple
banana
apple
orange
banana
kiwi

You can use the following command to sort the file and remove duplicates:

sort data.txt | uniq

The output will be:

apple
banana
kiwi
orange

You can also count how many times each unique line appears by using the -c option with uniq:

sort data.txt | uniq -c

The output will show the count of each unique line:

  2 apple
  2 banana
  1 kiwi
  1 orange

This combination is powerful for data processing tasks, allowing you to efficiently manage and analyze text data.