Linux diff Command: File Comparing

LinuxLinuxBeginner
Practice Now

Introduction

In this lab, we will explore the diff command, an essential tool for software developers and system administrators working with Linux. The diff command is used to compare the contents of two files and highlight the differences between them. This skill is particularly valuable when managing code versions, reviewing changes in configuration files, or identifying discrepancies in text-based data.

We'll simulate a software development scenario where you'll use the diff command to compare different versions of files, helping you understand how this command can be applied in real-world situations.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("`Linux`")) -.-> linux/VersionControlandTextEditorsGroup(["`Version Control and Text Editors`"]) linux(("`Linux`")) -.-> linux/FileandDirectoryManagementGroup(["`File and Directory Management`"]) linux/VersionControlandTextEditorsGroup -.-> linux/diff("`File Comparing`") linux/FileandDirectoryManagementGroup -.-> linux/cd("`Directory Changing`") subgraph Lab Skills linux/diff -.-> lab-219189{{"`Linux diff Command: File Comparing`"}} linux/cd -.-> lab-219189{{"`Linux diff Command: File Comparing`"}} end

Understanding the Basic Usage of diff

Let's start by comparing two simple text files to understand the basic output of the diff command.

First, navigate to the project directory:

cd /home/labex/project

Now, let's use the diff command to compare two files:

diff file1.txt file2.txt

You should see output similar to this:

1,2c1,2
< This is version 1 of the file.
< It contains some initial content.
---
> This is version 2 of the file.
> It contains updated content.
4c4
< This is the fourth line.
---
> This is a modified fourth line.

Let's break down this output:

  • The numbers (like 1,2c1,2) indicate the line numbers in both files where changes occur.
  • The letter c means "change". Other possible letters are a for "add" and d for "delete".
  • Lines starting with < are from the first file (file1.txt).
  • Lines starting with > are from the second file (file2.txt).
  • The --- separates the content from the first file and the second file.

This output tells us that:

  1. Lines 1 and 2 in both files are different.
  2. Line 4 in both files is different.
  3. Line 3 (not shown in the output) is identical in both files.

Comparing Python Scripts

Now, let's apply the diff command to a more realistic scenario. Imagine you're working on a Python script and want to compare two versions.

First, let's view the contents of both script versions:

cat script_v1.py

You should see:

def greet(name):
    print("Hello, " + name + "!")

def main():
    name = input("Enter your name: ")
    greet(name)

if __name__ == "__main__":
    main()

Now, let's look at the second version:

cat script_v2.py

You should see:

def greet(name):
    print(f"Hello, {name.capitalize()}!")

def main():
    name = input("Enter your name: ")
    greet(name)
    print("Thank you for using this script!")

if __name__ == "__main__":
    main()

Now, let's use diff to compare these scripts:

diff script_v1.py script_v2.py

You should see output similar to this:

2c2
<     print("Hello, " + name + "!")
---
>     print(f"Hello, {name.capitalize()}!")
6a7
>     print("Thank you for using this script!")

This output tells us:

  1. Line 2 has been changed. The greeting now uses an f-string and capitalizes the name.
  2. A new line (Line 7 in the new version) has been added with a thank you message.

Using the Unified Format

The unified format (-u option) provides a more readable output, especially for larger files or when context is important.

Compare the Python scripts using the unified format:

diff -u script_v1.py script_v2.py

You should see output similar to this:

--- script_v1.py	2023-12-28 10:00:00.000000000 +0000
+++ script_v2.py	2023-12-28 10:05:00.000000000 +0000
@@ -1,8 +1,9 @@
 def greet(name):
-    print("Hello, " + name + "!")
+    print(f"Hello, {name.capitalize()}!")

 def main():
     name = input("Enter your name: ")
     greet(name)
+    print("Thank you for using this script!")

 if __name__ == "__main__":

Let's break down this output:

  • The first two lines show the files being compared and their timestamps.
  • Lines starting with - are from the first file (script_v1.py).
  • Lines starting with + are from the second file (script_v2.py).
  • Lines without - or + provide context and are unchanged between the files.
  • The @@ -1,8 +1,9 @@ line indicates that we're seeing lines 1-8 from the first file and lines 1-9 from the second file.

This format is often preferred because it provides more context around the changes.

Ignoring Whitespace Changes

Sometimes, differences in whitespace (spaces, tabs) are not significant. The -w option tells diff to ignore these changes.

Let's create a new version of our script with some whitespace changes:

Notes: You must add some whitespace to the script manually, copy-pasting the code will not contain any whitespace.

cat > script_v3.py << EOF
def greet(name):
    print(f"Hello, {name.capitalize()}!")

def main():
    name = input("Enter your name: ")
    greet(name)
    print("Thank you for using this script!")

if __name__ == "__main__":
    main()
EOF

Now, let's compare script_v2.py and script_v3.py, first without and then with the -w option:

diff script_v2.py script_v3.py

You might see some differences due to whitespace. Now try:

diff -w script_v2.py script_v3.py

You should see no output, indicating no differences when ignoring whitespace.

This is useful when you want to focus on content changes rather than formatting differences.

Comparing Directories

The diff command can also compare entire directories. Let's create two directories with some files and compare them.

Create the directories and files:

echo "This is a file in dir1" > dir1/file.txt
echo "This is a file in dir2" > dir2/file.txt
echo "This file is unique to dir1" > dir1/unique1.txt
echo "This file is unique to dir2" > dir2/unique2.txt

Now, compare the directories:

diff -r dir1 dir2

You should see output similar to this:

Only in dir1: unique1.txt
Only in dir2: unique2.txt
diff -r dir1/file.txt dir2/file.txt
1c1
< This is a file in dir1
---
> This is a file in dir2

This output tells us:

  1. dir1 has a file called unique1.txt that doesn't exist in dir2.
  2. dir2 has a file called unique2.txt that doesn't exist in dir1.
  3. The file.txt exists in both directories but has different content.

The -r option makes diff recursively compare subdirectories as well, which is useful for comparing complex directory structures.

Summary

In this lab, we explored the Linux diff command in a software development context. We learned how to:

  1. Compare two text files and interpret the basic diff output
  2. Compare different versions of Python scripts
  3. Use the unified format for more readable output
  4. Ignore whitespace changes in comparisons
  5. Compare entire directories recursively

Additional diff options not covered in this lab include:

  • -y: Side-by-side comparison
  • -i: Ignore case differences
  • -b: Ignore changes in the amount of whitespace
  • -B: Ignore changes whose lines are all blank
  • -q: Report only when files differ, without showing the differences

These options can be combined for more specific comparisons.

Other Linux Tutorials you may like