Linux Common Line Comparison

LinuxLinuxBeginner
Practice Now

Introduction

In the Linux environment, comparing files is a common task for system administrators and developers. The comm command is a powerful tool that allows users to compare two sorted text files line by line and identify the unique and common lines between them.

This lab will guide you through using the comm command to analyze text files. You will learn how to create test files, compare their contents, and extract specific information from the comparison results. By the end of this lab, you will have a solid understanding of how to use this versatile command for file comparison tasks in Linux.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("Linux")) -.-> linux/BasicSystemCommandsGroup(["Basic System Commands"]) linux(("Linux")) -.-> linux/BasicFileOperationsGroup(["Basic File Operations"]) linux(("Linux")) -.-> linux/FileandDirectoryManagementGroup(["File and Directory Management"]) linux(("Linux")) -.-> linux/TextProcessingGroup(["Text Processing"]) linux(("Linux")) -.-> linux/VersionControlandTextEditorsGroup(["Version Control and Text Editors"]) linux/BasicSystemCommandsGroup -.-> linux/echo("Text Display") linux/BasicFileOperationsGroup -.-> linux/cat("File Concatenating") linux/BasicFileOperationsGroup -.-> linux/wc("Text Counting") linux/FileandDirectoryManagementGroup -.-> linux/cd("Directory Changing") linux/FileandDirectoryManagementGroup -.-> linux/mkdir("Directory Creating") linux/TextProcessingGroup -.-> linux/sort("Text Sorting") linux/VersionControlandTextEditorsGroup -.-> linux/comm("Common Line Comparison") subgraph Lab Skills linux/echo -.-> lab-271251{{"Linux Common Line Comparison"}} linux/cat -.-> lab-271251{{"Linux Common Line Comparison"}} linux/wc -.-> lab-271251{{"Linux Common Line Comparison"}} linux/cd -.-> lab-271251{{"Linux Common Line Comparison"}} linux/mkdir -.-> lab-271251{{"Linux Common Line Comparison"}} linux/sort -.-> lab-271251{{"Linux Common Line Comparison"}} linux/comm -.-> lab-271251{{"Linux Common Line Comparison"}} end

Prepare Your Text Files

Before we can use the comm command, we need to create some sample text files to work with. In this step, we will create two text files containing lists of common Linux commands.

First, let's create a working directory to organize our files:

mkdir -p ~/project/comm-lab
cd ~/project/comm-lab

Now, let's create our first text file named commands1.txt with a list of Linux commands:

echo -e "ls\ncd\npwd\nmkdir\ntouch\ncomm\nsed\nawk" | sort > commands1.txt

This command does the following:

  • echo -e outputs the text with interpretation of the backslash escapes (\n creates new lines)
  • The list of commands is piped (|) to the sort command to alphabetically sort the items
  • The sorted output is then redirected (>) to a file named commands1.txt

Let's create a second text file named commands2.txt with a slightly different list of commands:

echo -e "ls\ncd\npwd\ncomm\ngrep\nfind\nsed" | sort > commands2.txt

To verify that our files were created correctly, we can use the cat command to view their contents:

cat commands1.txt

You should see the following output:

awk
cd
comm
ls
mkdir
pwd
sed
touch

Now let's check the content of the second file:

cat commands2.txt

You should see:

cd
comm
find
grep
ls
pwd
sed

Notice that some commands appear in both files (like cd, ls, pwd, comm, sed), while others are unique to each file. This setup will allow us to demonstrate various features of the comm command in the next steps.

Using the Basic comm Command

Now that we have our sorted text files ready, we can explore the basic usage of the comm command. The comm command compares two sorted files line by line and outputs three columns:

  1. Lines unique to the first file
  2. Lines unique to the second file
  3. Lines common to both files

Let's run the basic comm command to compare our two files:

cd ~/project/comm-lab
comm commands1.txt commands2.txt

You should see output similar to this:

awk
		cd
		comm
	find
	grep
		ls
mkdir
		pwd
		sed
touch

The output might look confusing at first, but it follows a specific format:

  • Column 1 (no tabs at the beginning of the line): Lines only in commands1.txt (awk, mkdir, touch)
  • Column 2 (one tab at the beginning): Lines only in commands2.txt (find, grep)
  • Column 3 (two tabs at the beginning): Lines common to both files (cd, comm, ls, pwd, sed)

This default output allows you to see all differences and similarities at once, but it can be hard to read because of the tab formatting. In the next step, we'll learn how to make this output more useful using the comm command options.

Suppressing Columns with comm Options

The default output of the comm command can be difficult to read because of its column format. Fortunately, comm provides options to suppress specific columns, which makes it easier to extract just the information you need.

The options are:

  • -1 : Suppress column 1 (lines unique to first file)
  • -2 : Suppress column 2 (lines unique to second file)
  • -3 : Suppress column 3 (lines common to both files)

These options can be combined to display only the data you're interested in.

Finding Lines Unique to First File

To display only the lines that are unique to the first file (commands1.txt), we use options -2 and -3 to suppress columns 2 and 3:

cd ~/project/comm-lab
comm -23 commands1.txt commands2.txt

Output:

awk
mkdir
touch

These are the commands that appear only in commands1.txt.

Finding Lines Unique to Second File

Similarly, to display only the lines that are unique to the second file (commands2.txt), we use options -1 and -3:

comm -13 commands1.txt commands2.txt

Output:

find
grep

These are the commands that appear only in commands2.txt.

Finding Common Lines

To display only the lines that are common to both files, we use options -1 and -2:

comm -12 commands1.txt commands2.txt

Output:

cd
comm
ls
pwd
sed

These are the commands that appear in both files.

Saving Results to Files

It's often useful to save these results to separate files for future reference or processing. Let's do that:

comm -23 commands1.txt commands2.txt > unique_to_file1.txt
comm -13 commands1.txt commands2.txt > unique_to_file2.txt
comm -12 commands1.txt commands2.txt > common_lines.txt

Let's verify the contents of these new files:

echo "Contents of unique_to_file1.txt:"
cat unique_to_file1.txt
echo "Contents of unique_to_file2.txt:"
cat unique_to_file2.txt
echo "Contents of common_lines.txt:"
cat common_lines.txt

The output will show the lines unique to each file and the common lines, just as we saw in our previous commands.

These techniques are useful for comparing configuration files, finding differences between versions of a file, or identifying shared elements between datasets.

Practical Examples of Using comm

Now that you understand the basic usage of the comm command, let's explore some practical examples that demonstrate its utility in real-world scenarios.

Example 1: Finding New Entries

Imagine you have two lists of users - one from last week and one from today. You want to identify which users are new (added since last week).

Let's create these files:

cd ~/project/comm-lab
echo -e "user1\nuser2\nuser3\nuser4\nuser5" | sort > users_last_week.txt
echo -e "user1\nuser3\nuser5\nuser6\nuser7\nuser8" | sort > users_today.txt

To find the new users (in users_today.txt but not in users_last_week.txt):

comm -13 users_last_week.txt users_today.txt

Output:

user6
user7
user8

Example 2: Finding Removed Entries

Using the same files, let's find which users have been removed since last week:

comm -23 users_last_week.txt users_today.txt

Output:

user2
user4

Example 3: Combining comm with Other Commands

The comm command can be combined with other commands for more complex operations. For example, if we want to count how many common commands there are in our original files:

comm -12 commands1.txt commands2.txt | wc -l

This pipes the common lines to the wc -l command, which counts the number of lines.

Output:

5

This indicates there are 5 commands common to both files.

Example 4: Using comm with Unsorted Files

The comm command requires sorted input files. If you try to use it with unsorted files, you might get incorrect results. Let's demonstrate this:

echo -e "cat\nls\npwd\ncd" > unsorted1.txt
echo -e "ls\ncat\ngrep\npwd" > unsorted2.txt

If we try to use comm directly:

comm unsorted1.txt unsorted2.txt

The output might be misleading because the files aren't sorted. The correct approach is to sort the files first:

comm <(sort unsorted1.txt) <(sort unsorted2.txt)

This uses process substitution to sort the files on the fly before comparing them. You should see a properly formatted result with the correct columns.

These examples demonstrate the versatility of the comm command for comparing text files in various scenarios such as tracking changes, finding differences, and filtering data.

Summary

In this lab, you learned how to use the comm command in Linux to compare text files and identify differences between them. Here's a summary of what you accomplished:

  1. Created sorted text files for comparison using basic Linux commands
  2. Used the basic comm command to compare two files and understand its three-column output format
  3. Applied column suppression options (-1, -2, -3) to extract specific information:
    • Lines unique to the first file
    • Lines unique to the second file
    • Lines common to both files
  4. Saved comparison results to separate files for future reference
  5. Explored practical examples of using comm in real-world scenarios:
    • Finding new entries in updated lists
    • Identifying removed entries
    • Combining comm with other commands for more complex operations
    • Handling unsorted files appropriately

The comm command is a powerful tool for text file comparison in Linux. It allows system administrators, developers, and data analysts to efficiently identify differences and similarities between files, which is essential for tasks such as configuration management, version control, and data analysis.

Understanding how to effectively use comm and its options will enhance your productivity when working with text files in the Linux command line environment.