Linux uniq Command with Practical Examples

Introduction

In this lab, you will learn how to use the uniq command in Linux to remove duplicate lines from a file and count the occurrences of unique lines. The uniq command is a powerful tool for text processing and editing, allowing you to clean up and analyze text data efficiently. You will start by understanding the purpose and syntax of the uniq command, and then apply it to practical examples, such as removing duplicate lines and counting unique lines. This lab will equip you with the skills to effectively manage and manipulate text data using the uniq command in your Linux environment.

The lab covers the following steps:

Understand the Purpose and Syntax of the uniq Command
Remove Duplicate Lines from a File
Count the Occurrences of Unique Lines

Linux Commands Cheat Sheet

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("Linux")) -.-> linux/BasicSystemCommandsGroup(["Basic System Commands"]) linux(("Linux")) -.-> linux/BasicFileOperationsGroup(["Basic File Operations"]) linux(("Linux")) -.-> linux/TextProcessingGroup(["Text Processing"]) linux/BasicSystemCommandsGroup -.-> linux/echo("Text Display") linux/BasicFileOperationsGroup -.-> linux/cat("File Concatenating") linux/TextProcessingGroup -.-> linux/sort("Text Sorting") linux/TextProcessingGroup -.-> linux/uniq("Duplicate Filtering") subgraph Lab Skills linux/echo -.-> lab-422976{{"Linux uniq Command with Practical Examples"}} linux/cat -.-> lab-422976{{"Linux uniq Command with Practical Examples"}} linux/sort -.-> lab-422976{{"Linux uniq Command with Practical Examples"}} linux/uniq -.-> lab-422976{{"Linux uniq Command with Practical Examples"}} end

Understand the Purpose and Syntax of the uniq Command

In this step, you will learn about the purpose and syntax of the uniq command in Linux. The uniq command is used to remove duplicate lines from a file or input stream.

The basic syntax of the uniq command is:

uniq [OPTION]... [INPUT_FILE [OUTPUT_FILE]]

Here's a breakdown of the options:

-c: Prefix lines by the number of occurrences
-d: Only print duplicate lines
-u: Only print unique lines
-i: Ignore case when comparing lines
-f N: Ignore the first N fields on each line
-s N: Ignore the first N characters on each line

Let's start by creating a sample file with some duplicate lines:

echo -e "apple\norange\napple\nbanana\norange" > sample.txt

Example output:

apple
orange
apple
banana
orange

Now, let's use the uniq command to remove the duplicate lines:

uniq sample.txt

Example output:

apple
orange
banana

In this example, the uniq command removed the duplicate lines "apple" and "orange" from the input file.

Remove Duplicate Lines from a File

In this step, you will learn how to use the uniq command to remove duplicate lines from a file.

First, let's create a sample file with some duplicate lines:

echo -e "apple\norange\napple\nbanana\norange\napple" > sample.txt

Example output:

apple
orange
apple
banana
orange
apple

To remove the duplicate lines, we can use the uniq command:

uniq sample.txt

Example output:

apple
orange
banana

The uniq command compares adjacent lines and removes any duplicates. However, it only removes consecutive duplicates. If the duplicates are not next to each other, uniq will not remove them.

To remove all duplicate lines, regardless of their position, we can use the sort command in combination with uniq:

sort sample.txt | uniq

Example output:

apple
banana
orange

The sort command arranges the lines in alphabetical order, which ensures that the duplicate lines are adjacent. Then, the uniq command can remove the duplicates.

Count the Occurrences of Unique Lines

In this step, you will learn how to use the uniq command to count the occurrences of unique lines in a file.

Let's start by creating a sample file with some duplicate lines:

echo -e "apple\norange\napple\nbanana\norange\napple" > sample.txt

Example output:

apple
orange
apple
banana
orange
apple

To count the occurrences of unique lines, we can use the -c option with the uniq command:

uniq -c sample.txt

Example output:

   3 apple
   1 banana
   2 orange

In this output, the number before each line represents the count of occurrences for that unique line.

If you want to sort the output by the count, you can pipe the output to the sort command:

uniq -c sample.txt | sort -n

Example output:

   1 banana
   2 orange
   3 apple

The -n option to sort sorts the output numerically, which places the lines with the lowest count first.

Summary

In this lab, you learned about the purpose and syntax of the uniq command in Linux, which is used to remove duplicate lines from a file or input stream. You explored the various options available with the uniq command, such as counting the occurrences of unique lines, printing only duplicate lines, and ignoring case when comparing lines. You then applied the uniq command to remove duplicate lines from a sample file, and learned that it only removes consecutive duplicates. To remove all duplicate lines, you combined the sort command with uniq to ensure that the lines are arranged in alphabetical order before removing the duplicates.

Linux Commands Cheat Sheet