Introduction
In this lab, we will explore the diff command, an essential tool for software developers and system administrators working with Linux. The diff command is used to compare the contents of two files and highlight the differences between them. This skill is particularly valuable when managing code versions, reviewing changes in configuration files, or identifying discrepancies in text-based data.
We'll simulate a software development scenario where you'll use the diff command to compare different versions of files, helping you understand how this command can be applied in real-world situations.
Understanding the Basic Usage of diff
Let's start by comparing two simple text files to understand the basic output of the diff command.
First, navigate to the project directory:
cd /home/labex/project
Now, let's use the diff command to compare two files:
diff file1.txt file2.txt
You should see output similar to this:
1,2c1,2
< This is version 1 of the file.
< It contains some initial content.
---
> This is version 2 of the file.
> It contains updated content.
4c4
< This is the fourth line.
---
> This is a modified fourth line.
Let's break down this output:
- The numbers (like
1,2c1,2) indicate the line numbers in both files where changes occur. - The letter
cmeans "change". Other possible letters areafor "add" anddfor "delete". - Lines starting with
<are from the first file (file1.txt). - Lines starting with
>are from the second file (file2.txt). - The
---separates the content from the first file and the second file.
This output tells us that:
- Lines 1 and 2 in both files are different.
- Line 4 in both files is different.
- Line 3 (not shown in the output) is identical in both files.
Comparing Python Scripts
Now, let's apply the diff command to a more realistic scenario. Imagine you're working on a Python script and want to compare two versions.
First, let's view the contents of both script versions:
cat script_v1.py
You should see:
def greet(name):
print("Hello, " + name + "!")
def main():
name = input("Enter your name: ")
greet(name)
if __name__ == "__main__":
main()
Now, let's look at the second version:
cat script_v2.py
You should see:
def greet(name):
print(f"Hello, {name.capitalize()}!")
def main():
name = input("Enter your name: ")
greet(name)
print("Thank you for using this script!")
if __name__ == "__main__":
main()
Now, let's use diff to compare these scripts:
diff script_v1.py script_v2.py
You should see output similar to this:
2c2
< print("Hello, " + name + "!")
---
> print(f"Hello, {name.capitalize()}!")
6a7
> print("Thank you for using this script!")
This output tells us:
- Line 2 has been changed. The greeting now uses an f-string and capitalizes the name.
- A new line (Line 7 in the new version) has been added with a thank you message.
Using the Unified Format
The unified format (-u option) provides a more readable output, especially for larger files or when context is important.
Compare the Python scripts using the unified format:
diff -u script_v1.py script_v2.py
You should see output similar to this:
--- script_v1.py 2023-12-28 10:00:00.000000000 +0000
+++ script_v2.py 2023-12-28 10:05:00.000000000 +0000
@@ -1,8 +1,9 @@
def greet(name):
- print("Hello, " + name + "!")
+ print(f"Hello, {name.capitalize()}!")
def main():
name = input("Enter your name: ")
greet(name)
+ print("Thank you for using this script!")
if __name__ == "__main__":
Let's break down this output:
- The first two lines show the files being compared and their timestamps.
- Lines starting with
-are from the first file (script_v1.py). - Lines starting with
+are from the second file (script_v2.py). - Lines without
-or+provide context and are unchanged between the files. - The
@@ -1,8 +1,9 @@line indicates that we're seeing lines 1-8 from the first file and lines 1-9 from the second file.
This format is often preferred because it provides more context around the changes.
Ignoring Whitespace Changes
Sometimes, differences in whitespace (spaces, tabs) are not significant. The -w option tells diff to ignore these changes.
Let's create a new version of our script with some whitespace changes:
Notes: You must add some whitespace to the script manually, copy-pasting the code will not contain any whitespace.
cat > script_v3.py << EOF
def greet(name):
print(f"Hello, {name.capitalize()}!")
def main():
name = input("Enter your name: ")
greet(name)
print("Thank you for using this script!")
if __name__ == "__main__":
main()
EOF
Now, let's compare script_v2.py and script_v3.py, first without and then with the -w option:
diff script_v2.py script_v3.py
You might see some differences due to whitespace. Now try:
diff -w script_v2.py script_v3.py
You should see no output, indicating no differences when ignoring whitespace.
This is useful when you want to focus on content changes rather than formatting differences.
Comparing Directories
The diff command can also compare entire directories. Let's create two directories with some files and compare them.
Create the directories and files:
echo "This is a file in dir1" > dir1/file.txt
echo "This is a file in dir2" > dir2/file.txt
echo "This file is unique to dir1" > dir1/unique1.txt
echo "This file is unique to dir2" > dir2/unique2.txt
Now, compare the directories:
diff -r dir1 dir2
You should see output similar to this:
Only in dir1: unique1.txt
Only in dir2: unique2.txt
diff -r dir1/file.txt dir2/file.txt
1c1
< This is a file in dir1
---
> This is a file in dir2
This output tells us:
dir1has a file calledunique1.txtthat doesn't exist indir2.dir2has a file calledunique2.txtthat doesn't exist indir1.- The
file.txtexists in both directories but has different content.
The -r option makes diff recursively compare subdirectories as well, which is useful for comparing complex directory structures.
Summary
In this lab, we explored the Linux diff command in a software development context. We learned how to:
- Compare two text files and interpret the basic
diffoutput - Compare different versions of Python scripts
- Use the unified format for more readable output
- Ignore whitespace changes in comparisons
- Compare entire directories recursively
Additional diff options not covered in this lab include:
-y: Side-by-side comparison-i: Ignore case differences-b: Ignore changes in the amount of whitespace-B: Ignore changes whose lines are all blank-q: Report only when files differ, without showing the differences
These options can be combined for more specific comparisons.



