Creating a Simple Report
For our final task, let's create a simple HTML report summarizing some key information from our log file. We'll use an AWK script stored in a separate file for this more complex operation.
This step combines several AWK ideas from earlier sections:
- counters such as
total++
- arrays such as
ip_count[$3]++
- an
END block that prints the final summary
If the script feels long at first glance, focus on one block at a time. You do not need to memorize the whole file before running it.
First, create a file named log_report.awk with the following content:
Tips: Copy the content below and paste it into your terminal to create the file.
cat << 'EOF' > log_report.awk
BEGIN {
print "<html><body>"
print "<h1>Server Log Summary</h1>"
total = 0
errors = 0
}
{
total++
if ($6 >= 400) errors++
ip_count[$3]++
resource_count[$5]++
}
END {
print "<p>Total requests: " total "</p>"
print "<p>Error rate: " (errors/total) * 100 "%</p>"
print "<h2>Top 5 IP Addresses</h2>"
print "<ul>"
for (ip in ip_count) {
top_ips[ip] = ip_count[ip]
}
n = asort(top_ips, sorted_ips, "@val_num_desc")
for (i = 1; i <= 5 && i <= n; i++) {
for (ip in ip_count) {
if (ip_count[ip] == sorted_ips[i]) {
print "<li>" ip ": " ip_count[ip] " requests</li>"
break
}
}
}
print "</ul>"
print "<h2>Top 5 Requested Resources</h2>"
print "<ul>"
for (resource in resource_count) {
top_resources[resource] = resource_count[resource]
}
n = asort(top_resources, sorted_resources, "@val_num_desc")
for (i = 1; i <= 5 && i <= n; i++) {
for (resource in resource_count) {
if (resource_count[resource] == sorted_resources[i]) {
print "<li>" resource ": " resource_count[resource] " requests</li>"
break
}
}
}
print "</ul>"
print "</body></html>"
}
EOF
Let's understand this AWK script section by section:
-
BEGIN Block: Executes before processing any input lines
BEGIN {
print "<html><body>" ## Start HTML structure
print "<h1>Server Log Summary</h1>"
total = 0 ## Initialize counter for total requests
errors = 0 ## Initialize counter for error requests
}
-
Main Processing Block: Executes for each line of the input file
{
total++ ## Increment total request counter
if ($6 >= 400) errors++ ## Count error responses (status codes >= 400)
ip_count[$3]++ ## Count requests by IP address (field 3)
resource_count[$5]++ ## Count requests by resource (field 5)
}
-
END Block: Executes after processing all input lines
END {
## Print summary statistics
print "<p>Total requests: " total "</p>"
print "<p>Error rate: " (errors/total) * 100 "%</p>"
## Process and print top 5 IP addresses
## ...
## Process and print top 5 requested resources
## ...
print "</body></html>" ## End HTML structure
}
Before moving on, notice the overall flow:
BEGIN prints the opening HTML tags and initializes counters.
- The middle block processes each log line and updates totals.
END prints the final report after every line has been analyzed.
Let's examine the sorting logic for the top IPs (the resources section works the same way):
## Copy the counts to a new array for sorting
for (ip in ip_count) {
top_ips[ip] = ip_count[ip]
}
## Sort the array by value in descending order
n = asort(top_ips, sorted_ips, "@val_num_desc")
## Print the top 5 entries
for (i = 1; i <= 5 && i <= n; i++) {
## Find the original IP that matches this count
for (ip in ip_count) {
if (ip_count[ip] == sorted_ips[i]) {
print "<li>" ip ": " ip_count[ip] " requests</li>"
break
}
}
}
In this script:
- The
asort() function sorts the array
"@val_num_desc" is a special argument that tells it to sort numerically by value in descending order
- The nested loops find and print the top 5 entries
You can think of the nested loops like this:
- the first loop decides which counts belong in the top 5
- the second loop finds which IP address or resource produced each count
That lookup pattern is more advanced than the previous steps, so it is normal if this is the first part of the lab that feels like real scripting instead of a one-line command.
Now, let's run our AWK script to generate the report:
awk -f log_report.awk server_logs.txt > log_report.html
The -f option tells AWK to read the script from the specified file:
-f log_report.awk - Reads the AWK script from the file log_report.awk
server_logs.txt - Processes this file using the script
> log_report.html - Redirects the output to the file log_report.html
You can view the contents of the report using the cat command:
cat log_report.html
If the HTML output feels hard to scan in the terminal, preview just the first part first:
head -n 15 log_report.html
This report provides a summary of total requests, error rate, top 5 IP addresses, and top 5 requested resources. In a real-world scenario, you could open this HTML file in a web browser for a formatted view.
The approach we've used in this script demonstrates how AWK can be used for more complex data analysis tasks. You can extend this script to include additional statistics or different visualizations based on your specific needs.