Nginx Log Analysis and Optimization

LinuxLinuxBeginner
Practice Now

Introduction

In this project, you will learn how to perform log analysis on an Nginx server's access logs. You will retrieve valuable information from the logs, such as the top IP addresses by access count, the IP addresses that accessed the server a minimum of 10 times, the most accessed requests, and the request addresses with a 404 status.

๐ŸŽฏ Tasks

In this project, you will learn:

  • How to retrieve the 5 IP addresses with the highest number of accesses from a specific date
  • How to find all IP addresses that accessed the server a minimum of 10 times within a given date range
  • How to retrieve the ten most accessed requests from the log file, excluding static files and resources
  • How to write all request addresses with a 404 status from the log file

๐Ÿ† Achievements

After completing this project, you will be able to:

  • Analyze and extract meaningful information from Nginx access logs
  • Automate log analysis tasks using shell scripting
  • Understand and apply common log analysis techniques, such as filtering, sorting, and counting
  • Manage and organize the analysis results in a structured way

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL linux(("`Linux`")) -.-> linux/BasicFileOperationsGroup(["`Basic File Operations`"]) linux(("`Linux`")) -.-> linux/InputandOutputRedirectionGroup(["`Input and Output Redirection`"]) linux(("`Linux`")) -.-> linux/TextProcessingGroup(["`Text Processing`"]) shell(("`Shell`")) -.-> shell/BasicSyntaxandStructureGroup(["`Basic Syntax and Structure`"]) shell(("`Shell`")) -.-> shell/AdvancedScriptingConceptsGroup(["`Advanced Scripting Concepts`"]) linux/BasicFileOperationsGroup -.-> linux/head("`File Beginning Display`") linux/InputandOutputRedirectionGroup -.-> linux/pipeline("`Data Piping`") linux/InputandOutputRedirectionGroup -.-> linux/redirect("`I/O Redirecting`") linux/TextProcessingGroup -.-> linux/grep("`Pattern Searching`") linux/TextProcessingGroup -.-> linux/awk("`Text Processing`") linux/TextProcessingGroup -.-> linux/sort("`Text Sorting`") linux/TextProcessingGroup -.-> linux/uniq("`Duplicate Filtering`") shell/BasicSyntaxandStructureGroup -.-> shell/quoting("`Quoting Mechanisms`") shell/AdvancedScriptingConceptsGroup -.-> shell/subshells("`Subshells and Command Groups`") shell/AdvancedScriptingConceptsGroup -.-> shell/adv_redirection("`Advanced Redirection`") subgraph Lab Skills linux/head -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} linux/pipeline -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} linux/redirect -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} linux/grep -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} linux/awk -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} linux/sort -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} linux/uniq -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} shell/quoting -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} shell/subshells -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} shell/adv_redirection -.-> lab-301477{{"`Nginx Log Analysis and Optimization`"}} end

Retrieve the 5 IP Addresses With the Highest Number of Accesses From April 10, 2015

In this step, you will learn how to retrieve the 5 IP addresses with the highest number of accesses from the access.log file on April 10, 2015. Follow the steps below to complete this step:

  1. Open the terminal and navigate to the /home/labex/project directory.
  2. Use the following command to retrieve the 5 IP addresses with the highest number of accesses from April 10, 2015:
grep '10/Apr/2015' access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -5 | awk '{print $2}' > output1

This command does the following:

  • grep '10/Apr/2015' access.log: Filters the log file to only include lines from April 10, 2015.
  • awk '{print $1}': Extracts the IP address (the first field) from each log line.
  • sort: Sorts the IP addresses.
  • uniq -c: Counts the number of occurrences of each IP address.
  • sort -rn: Sorts the IP addresses by the count in descending order.
  • head -5: Selects the top 5 IP addresses.
  • awk '{print $2}': Extracts the IP address (the second field) from the sorted and counted output.
  • > output1: Redirects the output to the output1 file.
  1. Verify the contents of the output1 file to ensure that it contains the 5 IP addresses with the highest number of accesses from April 10, 2015, with one IP address per line and no empty lines.
216.244.66.249
216.244.66.231
140.205.225.185
140.205.201.39
140.205.201.32

Find All IP Addresses That Accessed the Server a Minimum of 10 Times Between April 11, 2015

In this step, you will learn how to find all IP addresses that accessed the server a minimum of 10 times between April 11, 2015. Follow the steps below to complete this step:

  1. Use the following command to find all IP addresses that accessed the server a minimum of 10 times between April 11, 2015:
grep '11/Apr/2015' access.log | awk '{print $1}' | sort | uniq -c | awk '$1 >= 10 {print $2}' > output2

This command does the following:

  • grep '11/Apr/2015' access.log: Filters the log file to only include lines from April 11, 2015.
  • awk '{print $1}': Extracts the IP address (the first field) from each log line.
  • sort: Sorts the IP addresses.
  • uniq -c: Counts the number of occurrences of each IP address.
  • awk '$1 >= 10 {print $2}': Filters the IP addresses that have a count of 10 or more, and prints the IP address (the second field).
  • > output2: Redirects the output to the output2 file.
  1. Verify the contents of the output2 file to ensure that it contains all IP addresses that accessed the server a minimum of 10 times between April 11, 2015, with one IP address per line and no empty lines.
108.245.182.93
123.127.3.30
140.205.201.39
216.244.66.231
216.244.66.249
218.75.230.17

Retrieve the Ten Most Accessed Requests From the Log File

In this step, you will learn how to retrieve the ten most accessed requests from the access.log file, excluding static files, images, or similar resources. Follow the steps below to complete this step:

  1. Use the following command to retrieve the ten most accessed requests from the log file:
grep -vE '(/robots.txt|\.js|\.css|\.png)' access.log | awk '{print $7}' | sort | uniq -c | sort -rn | head -10 | awk '{print $2}' > output3

This command does the following:

  • grep -vE '(/robots.txt|\.js|\.css|\.png)' access.log: Filters the log file to exclude lines that contain /robots.txt, .js, .css, or .png.
  • awk '{print $7}': Extracts the request address (the seventh field) from each log line.
  • sort: Sorts the request addresses.
  • uniq -c: Counts the number of occurrences of each request address.
  • sort -rn: Sorts the request addresses by the count in descending order.
  • head -10: Selects the top 10 request addresses.
  • awk '{print $2}': Extracts the request address (the second field) from the sorted and counted output.
  • > output3: Redirects the output to the output3 file.
  1. Verify the contents of the output3 file to ensure that it contains the ten most accessed requests from the log file, with one request address per line and no empty lines.
/
/j_acegi_security_check
/favicon.ico
400
/xref/linux-3.18.6/
/pmd/index.php
/pma/index.php
/phpMyAdmin/index.php
/phpmyadmin/index.php
check.best-proxies.ru:80

Write All Request Addresses With a 404 Status From the Log File

In this step, you will learn how to write all request addresses with a 404 status from the access.log file to the output4 file. Follow the steps below to complete this step:

  1. Use the following command to write all request addresses with a 404 status from the log file to the output4 file:
grep ' 404 ' access.log | awk '{print $7}' | sort | uniq > output4

This command does the following:

  • grep ' 404 ' access.log: Filters the log file to only include lines that contain a 404 status code.
  • awk '{print $7}': Extracts the request address (the seventh field) from each log line.
  • sort: Sorts the request addresses.
  • uniq: Removes any duplicate request addresses.
  • > output4: Redirects the output to the output4 file.
  1. Verify the contents of the output4 file to ensure that it contains all request addresses with a 404 status from the log file, with one request address per line and no duplicates.
/about/
/cgi?2
/cgi-bin/cgiSrv.cgi
/clusters.jsf
/dfshealth.jsp
/dnieyraqcvtu
/favicon.ico
/ganglia/index.php
/hadoop/dfshealth.jsp
/history/linux-3.18.6/arch/ia64/include/asm/processor.h
/history/linux-3.18.6/arch/m68k/amiga/
/history/linux-3.18.6/arch/m68k/kernel/
/history/linux-3.18.6/arch/s390/include/asm/lowcore.h
/history/linux-3.18.6/arch/s390/kernel/entry64.S
/history/linux-3.18.6/arch/tile/kernel/intvec_64.S
/history/linux-3.18.6/arch/unicore32/include/asm/thread_info.h
/history/linux-3.18.6/arch/unicore32/include/asm/unistd.h
/history/linux-3.18.6/arch/x86/include/asm/processor.h
/history/linux-3.18.6/arch/x86/include/asm/unistd.h
/history/linux-3.18.6/arch/x86/kernel/entry_64.S
    ...

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Linux Tutorials you may like