Counting and Summarizing Data
AWK is excellent for counting occurrences and summarizing data. Let's use it to count the number of requests for each HTTP status code.
Run this command:
awk '{count[$6]++} END {for (code in count) print code, count[code]}' server_logs.txt | sort -n
This command is more complex, so let's break it down step by step:
-
{count[$6]++}
- This is the main action performed for each line:
count
is an array (associative array or dictionary) we're creating
[$6]
uses the value of the 6th field (status code) as the array index/key
++
is the increment operator, adding 1 to the current value
- So for each line, we increment the counter for the specific status code found
-
END {for (code in count) print code, count[code]}
- This is executed after processing all lines:
END
is a special pattern that matches the end of the input
{...}
contains the action to perform after all input is processed
for (code in count)
is a loop that iterates through all keys in the count
array
print code, count[code]
prints each status code and its count
-
| sort -n
- Pipes the output to the sort command, which sorts numerically
When AWK processes an array like count[$6]++
, it automatically:
- Creates the array if it doesn't exist
- Creates a new element with value 0 if the key doesn't exist
- Then increments the value by 1
You should see output similar to this:
200 3562
301 45
302 78
304 112
400 23
403 8
404 89
500 15
This summary quickly shows you the distribution of status codes in your log file.
Now, let's find the top 5 most frequently accessed resources:
awk '{count[$5]++} END {for (resource in count) print count[resource], resource}' server_logs.txt | sort -rn | head -n 5
This command follows a similar pattern with a few changes:
{count[$5]++}
- Counts occurrences of the 5th field (the requested resource)
END {for (resource in count) print count[resource], resource}
- After processing all lines:
- Prints the count first, followed by the resource
- This order change facilitates numerical sorting by count
| sort -rn
- Sorts numerically in reverse order (highest counts first)
| head -n 5
- Limits output to the first 5 lines (top 5 results)
Output:
1823 /index.html
956 /about.html
743 /products.html
512 /services.html
298 /contact.html
These AWK commands demonstrate the power of using arrays for counting and summarizing. You can adapt this pattern to count any field or combination of fields in your data.
For example, to count the number of requests per IP address:
awk '{count[$3]++} END {for (ip in count) print ip, count[ip]}' server_logs.txt
To count requests by both method and status:
awk '{key=$4"-"$6; count[key]++} END {for (k in count) print k, count[k]}' server_logs.txt
These summaries can help you understand traffic patterns and identify popular (or problematic) resources on your server.