DAY 03: The Log Investigator

LinuxBeginner
Practice Now

Introduction

It's Day 3 at LabEx Corporation, and disaster has struck Project Phoenix! You arrive at the office to find Sarah Chen and the development team in crisis mode. The application you helped organize yesterday has encountered critical errors during its first major testing phase.

Emergency alerts are flooding the monitoring systems, users are reporting application failures, and the deployment pipeline has ground to a halt. Sarah turns to you with a look of desperation - the senior DevOps engineer is out sick, and the project deadline is fast approaching.

"We need our best investigator on this," Sarah says, handing you the incident report. "Your systematic approach to organizing our files was exactly what we needed. Now we need that same methodical thinking to solve this mystery."

Your mission is to dive deep into the Project Phoenix server, analyze logs and configuration files, and uncover the root cause of these failures. You'll use advanced Linux command-line tools to piece together the clues and restore stability to the application your team has worked so hard to build. The future of Project Phoenix—and possibly your career at TechNova—depends on your detective skills!

Reviewing Application Log File Contents

Your first step as an investigator is to check Project Phoenix's application log file. The application writes its logs to ~/project/logs/app.log. A flood of messages can be overwhelming, so you need to find the critical error messages quickly to understand what's going wrong with the system you helped organize yesterday.

Tasks

  • Filter the ~/project/logs/app.log file to find all lines containing the word ERROR.
  • Save the filtered lines to a new file named ~/project/error_report.txt.

Requirements

  • You must use a command-line tool to search the file.
  • The input file for your search is ~/project/logs/app.log.
  • The output must be saved in a file named ~/project/error_report.txt in the ~/project directory.
  • The output file should only contain the lines with the word ERROR.

Hints

  • The grep command is perfect for searching for patterns in text files.
  • To save the output of a command to a file, you can use the > redirection operator. This will create the file if it doesn't exist or overwrite it if it does.

Examples

After successfully filtering the log file, your ~/project/error_report.txt file should contain only the error lines:

$ cat ~/project/error_report.txt
[2023-10-26 10:00:03] ERROR: Failed to process payment transaction #12345.
[2023-10-26 10:00:05] ERROR: NullPointerException at com.innovatech.Billing.process(Billing.java:101).

The file should contain exactly 2 lines, both starting with timestamps and containing the word "ERROR".

✨ Check Solution and Practice

Investigating System Boot Messages

The application errors might be a symptom of a deeper hardware or kernel-level issue. A good place to look for such problems is the kernel ring buffer, which contains messages from the system's boot process and driver operations.

Tasks

  • Examine the system's kernel messages for any lines related to fail or error.
  • Save these findings into a file named ~/project/boot_issues.txt.

Requirements

  • You must use the dmesg command to view kernel messages.
  • Your search for fail or error should be case-insensitive.
  • The results must be saved to a file named ~/project/boot_issues.txt.
  • Note: You may need administrative privileges (sudo) to access kernel messages.

Hints

  • The dmesg command displays kernel messages. You can "pipe" its output to another command for filtering.
  • The pipe operator | sends the output of one command to the input of another.
  • The grep command's -i option makes the search case-insensitive.
  • To search for multiple patterns at once (like fail OR error), you can use grep -E 'pattern1|pattern2'.
  • Note: If you encounter a "Operation not permitted" error, try running the command with sudo to gain the necessary privileges.

Examples

After successfully filtering the kernel messages, your ~/project/boot_issues.txt file should contain relevant system messages:

$ cat ~/project/boot_issues.txt
[    0.330755] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    1.026520] RAS: Correctable Errors collector initialized.
[   28.260800] kernel: [   10.123456] my-driver: probe of 0000:00:1f.0 failed with error -2

The file should contain kernel messages that include words like "fail" or "error" (case-insensitive), showing potential hardware or driver issues during system boot.

✨ Check Solution and Practice

Examining the Web Server Configuration File

No critical hardware issues found. The problem might be in the web server configuration. Let's examine the Nginx configuration file to see how it's set up. Sometimes, misconfigurations like having too few worker processes can cause performance bottlenecks and lead to application failures under load.

Tasks

  • Search the web server configuration file at ~/project/config/nginx.conf.
  • Find the line containing the worker_processes directive.
  • Append this line to the ~/project/error_report.txt file you created in the first step.

Requirements

  • The input file is ~/project/config/nginx.conf.
  • You must append the result to ~/project/error_report.txt, not overwrite it.

Hints

  • You can use grep again for this task.
  • To append output to a file instead of overwriting it, use the >> operator.

Examples

After appending the worker_processes line to your existing error report, the ~/project/error_report.txt file should now contain both the original error lines and the new configuration line:

$ cat ~/project/error_report.txt
[2023-10-26 10:00:03] ERROR: Failed to process payment transaction #12345.
[2023-10-26 10:00:05] ERROR: NullPointerException at com.innovatech.Billing.process(Billing.java:101).
worker_processes 4;

The file should contain 3 lines total: 2 original error lines plus 1 new line with "worker_processes 4;".

✨ Check Solution and Practice

Comparing Staging and Production Configuration Files

A common source of production issues is a discrepancy between the staging and production environments. A feature might work perfectly in staging but fail in production due to a small configuration difference. Let's compare the application's configuration files from both environments to spot any differences.

Tasks

  • Compare the staging configuration file ~/project/config/staging/app.conf with the production configuration file ~/project/config/production/app.conf.
  • Save the differences to a new file named ~/project/config_diff.txt.

Requirements

  • You must use the diff command.
  • The output showing the differences must be saved to ~/project/config_diff.txt.

Hints

  • The diff command is designed specifically for comparing two files line by line.
  • The basic syntax is diff file1 file2, where it shows what changes need to be made to file1 to make it identical to file2.
  • The order of files matters! diff A B and diff B A will show different outputs.
  • You can redirect the output of diff to a file just like you did with grep.

Examples

After comparing the staging and production configuration files, your ~/project/config_diff.txt file should show the differences between the two environments:

$ cat ~/project/config_diff.txt
1,5c1,5
< ## Staging Configuration
< database.url=jdbc:mysql://staging-db:3306/nexus
< api.key=staging_key_abc123
< feature.flag.new_dashboard=true
< timeout.ms=3000
---
> ## Production Configuration
> database.url=jdbc:mysql://prod-db:3306/nexus
> api.key=prod_key_xyz789
> feature.flag.new_dashboard=false
> timeout.ms=5000

The diff output shows what changes would need to be made to the staging configuration file to make it match the production configuration file. Lines starting with < show content from the staging file, while lines starting with > show content from the production file. This reveals that the production environment uses different database URLs, API keys, feature flags, and timeout values compared to staging.

✨ Check Solution and Practice

Verifying Directory Consistency Between Servers

The configuration difference is a strong lead! It seems the production server might also be missing some critical files that exist on the staging server. This could be due to a failed deployment. Let's simulate this by comparing two directories that represent the file structures from two different servers.

Tasks

  • You have two directories: /home/labex/project/server1_files (representing the staging server) and /home/labex/project/server2_files (representing the production server).
  • Compare these two directories to find out which files are unique to server1_files.
  • Save the complete comparison output to a file named /home/labex/project/missing_files.txt.

Requirements

  • You must use the diff command to compare the two directories.
  • The output must be saved to /home/labex/project/missing_files.txt.

Hints

  • The diff command can also compare directories if you provide directory paths instead of file paths.
  • Using the -r or --recursive option with diff is a good practice for comparing directories, as it will compare all files within them.
  • The output format of diff on directories will explicitly state which files are "Only in" a specific directory.
  • Just like with files, the order matters when comparing directories. diff dir1 dir2 shows what's in dir1 but not in dir2, while diff dir2 dir1 shows the opposite.

Examples

After comparing the two server directories, your /home/labex/project/missing_files.txt file should show which files are missing from the production server:

$ cat /home/labex/project/missing_files.txt
Only in /home/labex/project/server1_files: asset2.js

This output indicates that asset2.js exists in the first directory (server1_files, representing the staging server) but is missing from the second directory (server2_files, representing the production server). By comparing staging first and production second, we can easily identify files that are missing from production, which could explain some of the application failures.

✨ Check Solution and Practice

Summary

Exceptional detective work! You have successfully identified the root causes of Project Phoenix's critical failures and provided Sarah Chen and the development team with actionable intelligence to resolve the issues.

Through your systematic investigation, you've mastered essential troubleshooting commands:

  • grep: To filter log files and extract critical error information.
  • dmesg: To investigate system-level hardware and kernel issues.
  • diff: To compare configuration files and identify discrepancies between environments.
  • Command pipelines and redirection: To efficiently process and document your findings.

Your methodical approach to log analysis has saved Project Phoenix from a potentially catastrophic failure. The development team now has clear direction on fixing the configuration mismatches and missing deployment files you discovered.

Sarah Chen was so impressed with your investigation skills that she's recommending you for a security role. Tomorrow, you'll step into the shoes of the Fortress Guardian to secure Project Phoenix's infrastructure and protect it from future threats!