Introduction to Awk
awk
is a powerful text-processing tool that's particularly good at handling structured data. It treats each line of input as a record and each word on that line as a field. Let's start with some basic awk
operations.
First, create a new file with some structured data:
echo -e "Name Age Country\nAlice 25 USA\nBob 30 Canada\nCharlie 35 UK\nDavid 28 Australia" > awk_test.txt
This creates a file named awk_test.txt
with a header row and four data rows.
Now, let's use awk
to print specific fields:
awk '{print $1}' awk_test.txt
This prints the first field (column) of each line. In awk
, $1
refers to the first field, $2
to the second, and so on. $0
refers to the entire line.
To print multiple fields:
awk '{print $1, $2}' awk_test.txt
This prints the first and second fields of each line.
We can also use conditions:
awk '$2 > 28 {print $1 " is over 28"}' awk_test.txt
This prints names of people over 28 years old.
Let's try something more complex:
awk 'NR > 1 {sum += $2} END {print "Average age:", sum/(NR-1)}' awk_test.txt
This calculates and prints the average age, skipping the header row.
Explanation:
- In
awk
, each line is automatically split into fields, typically by whitespace.
$1
, $2
, etc., refer to the first, second, etc., fields in each line.
NR
is a built-in variable that represents the current record (line) number.
- The
END
block is executed after all lines have been processed.
sum += $2
adds the value of the second field (age) to a running total.
Try these commands and observe the results. awk
is incredibly powerful for data processing tasks.