Introduction
In the Linux environment, comparing files is a common task for system administrators and developers. The comm command is a powerful tool that allows users to compare two sorted text files line by line and identify the unique and common lines between them.
This lab will guide you through using the comm command to analyze text files. You will learn how to create test files, compare their contents, and extract specific information from the comparison results. By the end of this lab, you will have a solid understanding of how to use this versatile command for file comparison tasks in Linux.
Prepare Your Text Files
Before we can use the comm command, we need to create some sample text files to work with. In this step, we will create two text files containing lists of common Linux commands.
First, let's create a working directory to organize our files:
mkdir -p ~/project/comm-lab
cd ~/project/comm-lab
Now, let's create our first text file named commands1.txt with a list of Linux commands:
echo -e "ls\ncd\npwd\nmkdir\ntouch\ncomm\nsed\nawk" | sort > commands1.txt
This command does the following:
echo -eoutputs the text with interpretation of the backslash escapes (\ncreates new lines)- The list of commands is piped (
|) to thesortcommand to alphabetically sort the items - The sorted output is then redirected (
>) to a file namedcommands1.txt
Let's create a second text file named commands2.txt with a slightly different list of commands:
echo -e "ls\ncd\npwd\ncomm\ngrep\nfind\nsed" | sort > commands2.txt
To verify that our files were created correctly, we can use the cat command to view their contents:
cat commands1.txt
You should see the following output:
awk
cd
comm
ls
mkdir
pwd
sed
touch
Now let's check the content of the second file:
cat commands2.txt
You should see:
cd
comm
find
grep
ls
pwd
sed
Notice that some commands appear in both files (like cd, ls, pwd, comm, sed), while others are unique to each file. This setup will allow us to demonstrate various features of the comm command in the next steps.
Using the Basic comm Command
Now that we have our sorted text files ready, we can explore the basic usage of the comm command. The comm command compares two sorted files line by line and outputs three columns:
- Lines unique to the first file
- Lines unique to the second file
- Lines common to both files
Let's run the basic comm command to compare our two files:
cd ~/project/comm-lab
comm commands1.txt commands2.txt
You should see output similar to this:
awk
cd
comm
find
grep
ls
mkdir
pwd
sed
touch
The output might look confusing at first, but it follows a specific format:
- Column 1 (no tabs at the beginning of the line): Lines only in
commands1.txt(awk,mkdir,touch) - Column 2 (one tab at the beginning): Lines only in
commands2.txt(find,grep) - Column 3 (two tabs at the beginning): Lines common to both files (
cd,comm,ls,pwd,sed)
This default output allows you to see all differences and similarities at once, but it can be hard to read because of the tab formatting. In the next step, we'll learn how to make this output more useful using the comm command options.
Suppressing Columns with comm Options
The default output of the comm command can be difficult to read because of its column format. Fortunately, comm provides options to suppress specific columns, which makes it easier to extract just the information you need.
The options are:
-1: Suppress column 1 (lines unique to first file)-2: Suppress column 2 (lines unique to second file)-3: Suppress column 3 (lines common to both files)
These options can be combined to display only the data you're interested in.
Finding Lines Unique to First File
To display only the lines that are unique to the first file (commands1.txt), we use options -2 and -3 to suppress columns 2 and 3:
cd ~/project/comm-lab
comm -23 commands1.txt commands2.txt
Output:
awk
mkdir
touch
These are the commands that appear only in commands1.txt.
Finding Lines Unique to Second File
Similarly, to display only the lines that are unique to the second file (commands2.txt), we use options -1 and -3:
comm -13 commands1.txt commands2.txt
Output:
find
grep
These are the commands that appear only in commands2.txt.
Finding Common Lines
To display only the lines that are common to both files, we use options -1 and -2:
comm -12 commands1.txt commands2.txt
Output:
cd
comm
ls
pwd
sed
These are the commands that appear in both files.
Saving Results to Files
It's often useful to save these results to separate files for future reference or processing. Let's do that:
comm -23 commands1.txt commands2.txt > unique_to_file1.txt
comm -13 commands1.txt commands2.txt > unique_to_file2.txt
comm -12 commands1.txt commands2.txt > common_lines.txt
Let's verify the contents of these new files:
echo "Contents of unique_to_file1.txt:"
cat unique_to_file1.txt
echo "Contents of unique_to_file2.txt:"
cat unique_to_file2.txt
echo "Contents of common_lines.txt:"
cat common_lines.txt
The output will show the lines unique to each file and the common lines, just as we saw in our previous commands.
These techniques are useful for comparing configuration files, finding differences between versions of a file, or identifying shared elements between datasets.
Practical Examples of Using comm
Now that you understand the basic usage of the comm command, let's explore some practical examples that demonstrate its utility in real-world scenarios.
Example 1: Finding New Entries
Imagine you have two lists of users - one from last week and one from today. You want to identify which users are new (added since last week).
Let's create these files:
cd ~/project/comm-lab
echo -e "user1\nuser2\nuser3\nuser4\nuser5" | sort > users_last_week.txt
echo -e "user1\nuser3\nuser5\nuser6\nuser7\nuser8" | sort > users_today.txt
To find the new users (in users_today.txt but not in users_last_week.txt):
comm -13 users_last_week.txt users_today.txt
Output:
user6
user7
user8
Example 2: Finding Removed Entries
Using the same files, let's find which users have been removed since last week:
comm -23 users_last_week.txt users_today.txt
Output:
user2
user4
Example 3: Combining comm with Other Commands
The comm command can be combined with other commands for more complex operations. For example, if we want to count how many common commands there are in our original files:
comm -12 commands1.txt commands2.txt | wc -l
This pipes the common lines to the wc -l command, which counts the number of lines.
Output:
5
This indicates there are 5 commands common to both files.
Example 4: Using comm with Unsorted Files
The comm command requires sorted input files. If you try to use it with unsorted files, you might get incorrect results. Let's demonstrate this:
echo -e "cat\nls\npwd\ncd" > unsorted1.txt
echo -e "ls\ncat\ngrep\npwd" > unsorted2.txt
If we try to use comm directly:
comm unsorted1.txt unsorted2.txt
The output might be misleading because the files aren't sorted. The correct approach is to sort the files first:
comm <(sort unsorted1.txt) <(sort unsorted2.txt)
This uses process substitution to sort the files on the fly before comparing them. You should see a properly formatted result with the correct columns.
These examples demonstrate the versatility of the comm command for comparing text files in various scenarios such as tracking changes, finding differences, and filtering data.
Summary
In this lab, you learned how to use the comm command in Linux to compare text files and identify differences between them. Here's a summary of what you accomplished:
- Created sorted text files for comparison using basic Linux commands
- Used the basic
commcommand to compare two files and understand its three-column output format - Applied column suppression options (
-1,-2,-3) to extract specific information:- Lines unique to the first file
- Lines unique to the second file
- Lines common to both files
- Saved comparison results to separate files for future reference
- Explored practical examples of using
commin real-world scenarios:- Finding new entries in updated lists
- Identifying removed entries
- Combining
commwith other commands for more complex operations - Handling unsorted files appropriately
The comm command is a powerful tool for text file comparison in Linux. It allows system administrators, developers, and data analysts to efficiently identify differences and similarities between files, which is essential for tasks such as configuration management, version control, and data analysis.
Understanding how to effectively use comm and its options will enhance your productivity when working with text files in the Linux command line environment.



