Linux gawk Command with Practical Examples

LinuxLinuxBeginner
Practice Now

Introduction

In this lab, you will learn how to use the powerful gawk command, a text processing tool in Linux. gawk is a programming language that allows you to manipulate and extract data from text files. You will start by exploring the basics of the gawk command, including how to check the version installed on your system. Then, you will learn how to extract specific data from text files using gawk, as well as perform calculations and transformations on the data. This lab provides practical examples to help you become proficient in using gawk for text processing and editing tasks.

Linux Commands Cheat Sheet


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("`Linux`")) -.-> linux/BasicFileOperationsGroup(["`Basic File Operations`"]) linux(("`Linux`")) -.-> linux/BasicSystemCommandsGroup(["`Basic System Commands`"]) linux(("`Linux`")) -.-> linux/TextProcessingGroup(["`Text Processing`"]) linux/BasicFileOperationsGroup -.-> linux/cat("`File Concatenating`") linux/BasicSystemCommandsGroup -.-> linux/printf("`Text Formatting`") linux/TextProcessingGroup -.-> linux/awk("`Text Processing`") subgraph Lab Skills linux/cat -.-> lab-422696{{"`Linux gawk Command with Practical Examples`"}} linux/printf -.-> lab-422696{{"`Linux gawk Command with Practical Examples`"}} linux/awk -.-> lab-422696{{"`Linux gawk Command with Practical Examples`"}} end

Introduction to gawk Command

In this step, you will learn about the gawk command, a powerful text processing tool in Linux. gawk is a programming language that allows you to manipulate and extract data from text files.

First, let's check the version of gawk installed on your system:

gawk --version

Example output:

GNU Awk 5.1.0, API: 2.0 (GNU MPFR 4.1.0, GNU MP 6.2.0)
Copyright (C) 1989, 1991-2021, the Free Software Foundation.

The gawk command is used to search and process text files. It can perform a wide range of operations, such as:

  • Extracting specific fields or columns from a text file
  • Performing calculations and transformations on data
  • Generating reports and summaries
  • Automating text-based tasks

To get started, let's create a sample text file that we'll use throughout this lab:

cat > ~/project/data.txt << EOF
Name,Age,City
John,25,New York
Jane,30,London
Bob,35,Paris
EOF

This file contains a list of names, ages, and cities, separated by commas.

Now, let's try a simple gawk command to print the entire file:

gawk '{print}' ~/project/data.txt

Example output:

Name,Age,City
John,25,New York
Jane,30,London
Bob,35,Paris

In this command, the '{print}' part tells gawk to print each line of the file.

Let's break down the basic structure of a gawk command:

  • gawk: The gawk command
  • '{print}': The pattern and action. In this case, the pattern is empty (matches all lines), and the action is to print the line.
  • ~/project/data.txt: The input file.

In the next step, you'll learn how to extract specific data from the text file using gawk.

Extracting Data from Text Files using gawk

In this step, you will learn how to use gawk to extract specific data from the text file you created in the previous step.

Let's start by printing the second column (Age) from the data.txt file:

gawk '{print $2}' ~/project/data.txt

Example output:

Age
25
30
35

In this command, $2 represents the second column of the input data. gawk automatically splits each line into fields (columns) based on the delimiter (in this case, the comma).

To print the first and third columns (Name and City), you can use the following command:

gawk '{print $1, $3}' ~/project/data.txt

Example output:

Name City
John New York
Jane London
Bob Paris

You can also use the -F option to specify a different field separator. For example, to use a comma as the field separator:

gawk -F, '{print $1, $3}' ~/project/data.txt

Example output:

Name City
John New York
Jane London
Bob Paris

Additionally, gawk allows you to perform conditional processing. For example, to print only the names of people older than 30:

gawk -F, '$2 > 30 {print $1}' ~/project/data.txt

Example output:

Bob

In this command, $2 > 30 is the condition, and {print $1} is the action performed for the lines that match the condition.

Try experimenting with different gawk commands to extract and manipulate the data in the data.txt file. The more you practice, the more comfortable you'll become with using gawk for text processing tasks.

Performing Calculations and Transformations with gawk

In this step, you will learn how to use gawk to perform calculations and transformations on the data in the data.txt file.

Let's start by calculating the average age of the people in the file:

gawk -F, '{sum += $2} END {print "Average age:", sum/NR}' ~/project/data.txt

Example output:

Average age: 30

In this command:

  • {sum += $2} adds the value of the second column (age) to the sum variable for each line.
  • END {print "Average age:", sum/NR} calculates the average age by dividing the total sum by the number of records (NR).

Next, let's transform the data by converting the ages to years and months:

gawk -F, '{years = int($2 / 1); months = ($2 % 1) * 12; print $1, years "y", months "m"}' ~/project/data.txt

Example output:

John 25y 0m
Jane 30y 0m
Bob 35y 0m

In this command:

  • {years = int($2 / 1); months = ($2 % 1) * 12; print $1, years "y", months "m"} calculates the years and months from the age value in the second column.

You can also use gawk to generate a report with additional calculations or transformations. For example, let's create a report that includes the name, age, and city, along with a "tax bracket" based on the age:

gawk -F, '{
  if ($2 < 30)
    tax_bracket = "Low"
  else if ($2 >= 30 && $2 < 50)
    tax_bracket = "Medium"
  else
    tax_bracket = "High"
  print $1, $2, $3, tax_bracket
}' ~/project/data.txt

Example output:

John 25 New York Low
Jane 30 London Medium
Bob 35 Paris Medium

In this command:

  • The if-else statement determines the tax bracket based on the age.
  • The print statement outputs the name, age, city, and tax bracket for each record.

Feel free to experiment with more advanced gawk commands and transformations to further explore the capabilities of this powerful text processing tool.

Summary

In this lab, you learned about the gawk command, a powerful text processing tool in Linux. You started by exploring the basics of the gawk command, including how to check the version and use it to print the contents of a sample text file. You then learned how to extract specific data from the text file, such as printing the second column (Age) using the $2 syntax. Finally, you discovered how to perform calculations and transformations on the data using gawk, such as calculating the average age.

Throughout the lab, you gained a solid understanding of the versatility of the gawk command and its ability to manipulate and extract data from text files. These skills can be applied to a wide range of text-based tasks, from data analysis to report generation and automation.

Linux Commands Cheat Sheet

Other Linux Tutorials you may like