Monitor Processes in Red Hat Enterprise Linux

Red Hat Enterprise LinuxBeginner
Practice Now

Introduction

In this lab, you will gain hands-on experience in monitoring and managing Linux processes, a fundamental skill for any system administrator or developer. You will learn to understand process states and lifecycles using ps and top, control background and foreground jobs, and terminate processes effectively with kill, killall, and pkill. Furthermore, you will explore how to monitor system load and CPU usage using uptime and lscpu, and analyze process activity in detail with top. This lab will equip you with the essential tools and knowledge to efficiently manage processes and maintain system health on RHEL.

Understand Process States and Lifecycle with ps and top

In this step, you will learn about Linux process states and their lifecycle. Understanding process states is crucial for monitoring and managing system resources effectively. You will use the ps and top commands to observe processes and their states.

Every process in Linux has a state that describes its current activity. These states are defined by the kernel and indicate whether a process is running, sleeping, stopped, or in other conditions.

Let's start by examining process states using the ps command. The ps command reports a snapshot of the current processes.

First, open your terminal. You can do this by clicking on the terminal icon on the desktop or by pressing Ctrl+Alt+T. Your default working directory is ~/project.

To see all processes running on your system, including those without a controlling terminal, use the ps aux command. The aux options display processes owned by all users (a), processes without a controlling terminal (x), and show a user-oriented format (u).

ps aux

You will see a long list of processes. Pay attention to the STAT column, which shows the state of each process. Common states you might observe include:

  • R: Running or Runnable (on CPU or waiting to run)
  • S: Interruptible Sleep (waiting for an event to complete)
  • D: Uninterruptible Sleep (waiting for I/O, cannot be interrupted)
  • T: Stopped (suspended by a signal)
  • Z: Zombie (process terminated, but parent hasn't reaped its exit status)
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.2 171820 16140 ?        Ss   HH:MM   0:01 /usr/lib/systemd/systemd ...
root           2  0.0  0.0      0     0 ?        S    HH:MM   0:00 [kthreadd]
labex       3448  0.0  0.2 266904  3836 pts/0    R+   HH:MM   0:00 ps aux
...output omitted...

Next, let's use the ps -ef command. This command provides a full listing (f) of all processes (e), showing more details like the parent process ID (PPID), CPU utilization (C), start time (STIME), and the command (CMD).

ps -ef

This output is often used to see the parent-child relationships between processes, although it doesn't explicitly show a tree structure.

UID        PID  PPID  C STIME TTY          TIME CMD
root           1       0  0 HH:MM ?        00:00:01 /usr/lib/systemd/systemd ...
root           2       0  0 HH:MM ?        00:00:00 [kthreadd]
root           3       2  0 HH:MM ?        00:00:00 [rcu_gp]
...output omitted...

To visualize the process hierarchy, you can use the ps --forest option. This displays processes in a tree format, making it easier to understand which processes spawned others.

ps --forest

This command is particularly useful for debugging and understanding how different services and applications are structured on your system.

  PID TTY          TIME CMD
 2768 pts/0    00:00:00 bash
 5947 pts/0    00:00:00  \_ sleep 10000
 6377 pts/0    00:00:00  \_ ps --forest
...output omitted...

Now, let's explore the top command, which provides a dynamic real-time view of a running system. It displays a summary of system information and a list of processes or threads currently being managed by the Linux kernel.

Run the top command:

top

You will see an interactive display. The top section provides system summary information, including uptime, load average, tasks summary, CPU statistics, and memory usage. The lower section lists individual processes, sorted by CPU usage by default.

In the top output, observe the S column for process states, similar to ps. You can also see %CPU (CPU usage percentage) and %MEM (memory usage percentage) for each process.

top - HH:MM:SS up DD min,  X users,  load average: X.XX, X.XX, X.XX
Tasks: XXX total,   X running, XXX sleeping,   X stopped,   X zombie
%Cpu(s):  X.X us,  X.X sy,  X.X ni, XX.X id,  X.X wa,  X.X hi,  X.X si,  X.X st
MiB Mem :   XXXX.X total,   XXXX.X free,    XXX.X used,    XXX.X buff/cache
MiB Swap:   XXXX.X total,   XXXX.X free,      X.X used.   XXXX.X avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
XXXX labex     20   0  XXXXXX   XXXX   XXXX R   X.X   X.X   0:00.0X top
...output omitted...

While top is running, you can press q to quit and return to your terminal prompt.

Understanding these commands and the information they provide is fundamental for monitoring and troubleshooting processes on a Linux system.

Control Background and Foreground Jobs

In this step, you will learn how to manage jobs in the background and foreground within your shell session. This is a fundamental skill for efficient command-line usage, allowing you to run long-running tasks without tying up your terminal.

A "job" in the context of a shell refers to a command or a pipeline of commands that the shell is managing. You can run jobs in the background, bring them to the foreground, or suspend them.

Let's start by running a simple command in the background. We'll use the sleep command, which simply waits for a specified amount of time.

To run sleep 10000 (which waits for 10000 seconds) in the background, append an ampersand (&) to the command:

sleep 10000 &

When you press Enter, the shell will immediately return to the prompt, and the sleep command will be running in the background. You will see output similar to this, indicating the job number and its Process ID (PID):

[1] 5947

The [1] indicates that this is job number 1 in your current shell session, and 5947 is the PID of the sleep process.

To view a list of all jobs currently managed by your shell, use the jobs command:

jobs

You should see the sleep command listed as a running background job:

[1]+ Running    sleep 10000 &

The + next to [1] indicates that this is the current job (the one that would be acted upon by default if you didn't specify a job number).

Now, let's bring this background job to the foreground. This means the job will take control of your terminal again. Use the fg command followed by the job number (prefixed with %):

fg %1

The sleep 10000 command will now be in the foreground. Your terminal will be occupied by this command, and you won't get a prompt until it finishes or is suspended.

sleep 10000

While a command is running in the foreground, you can send it to the background and suspend it by pressing Ctrl+Z. This sends a SIGTSTP signal to the process.

Press Ctrl+Z now:

^Z

You will see output indicating that the job has been stopped and moved to the background:

[1]+  Stopped                 sleep 10000

Now, if you run jobs again, you will see that the sleep command is in a Stopped state:

jobs
[1]+ Stopped                 sleep 10000

To resume a stopped background job, you can use the bg command followed by the job number. This will restart the job in the background.

bg %1

The job will now be running in the background again:

[1]+ sleep 10000 &

Finally, let's clean up the background job. You can terminate a background job using the kill command with its PID, or by bringing it to the foreground and then terminating it (e.g., with Ctrl+C). For now, let's bring it to the foreground and terminate it.

fg %1

Now that sleep 10000 is in the foreground, press Ctrl+C to terminate it. This sends a SIGINT signal to the process.

^C

You will see a message indicating that the job has been terminated:

[1]+  Terminated              sleep 10000

If you run jobs again, you should see that there are no more jobs listed:

jobs
(no output)

This demonstrates the basic workflow of managing jobs in the background and foreground, which is essential for multitasking in the terminal.

Terminate Processes with kill, killall, and pkill

In this step, you will learn how to terminate processes using the kill, killall, and pkill commands. These commands are essential for managing system resources and stopping misbehaving applications.

Processes in Linux respond to signals. A signal is a software interrupt delivered to a process. Different signals have different meanings, such as terminating a process, suspending it, or making it reload its configuration.

First, let's understand some fundamental process management signals:

  • SIGTERM (15): The default signal sent by kill. It's a "polite" request to terminate. The process can catch this signal, clean up, and then exit.
  • SIGKILL (9): An "unblockable" signal that forces immediate termination. The process cannot ignore or handle this signal. Use it as a last resort.
  • SIGHUP (1): Often used to tell a process to reload its configuration files without restarting.
  • SIGINT (2): Sent by pressing Ctrl+C, typically used to interrupt a foreground process.
  • SIGSTOP (19): Suspends a process. It cannot be blocked or handled.
  • SIGCONT (18): Resumes a stopped process.

You can list all available signals and their numbers using kill -l:

kill -l

You will see a list of signals like this:

 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
...output omitted...

Using kill

The kill command sends a specified signal to a process identified by its Process ID (PID).

Let's create a few background processes to practice terminating them. We'll use sleep commands again.

sleep 300 &
sleep 301 &
sleep 302 &

Now, use jobs to see their job numbers and PIDs:

jobs
[1] 1234
[2] 1235
[3] 1236

(Note: Your PIDs will be different.)

Let's find the PID of the first sleep process. You can use ps aux | grep sleep and look for the PID associated with sleep 300.

ps aux | grep sleep

You will see output similar to this. Identify the PID for sleep 300. For example, if the PID is 1234:

labex       1234  0.0  0.0   2200   680 pts/0    S    HH:MM   0:00 sleep 300
labex       1235  0.0  0.0   2200   680 pts/0    S    HH:MM   0:00 sleep 301
labex       1236  0.0  0.0   2200   680 pts/0    S    HH:MM   0:00 sleep 302
labex       1237  0.0  0.0   6000  1000 pts/0    S+   HH:MM   0:00 grep sleep

To terminate sleep 300 using the default SIGTERM signal, use kill followed by its PID. Replace 1234 with the actual PID you found:

kill 1234

You might see a message like [1]+ Terminated sleep 300. Verify it's gone using jobs or ps aux | grep sleep:

jobs
[2]- Running    sleep 301 &
[3]+ Running    sleep 302 &

Now, let's forcefully terminate sleep 301 using SIGKILL. Find its PID (e.g., 1235) and use kill -9 or kill -SIGKILL:

kill -9 1235

You will likely see [2]- Killed sleep 301. Verify again:

jobs
[3]+ Running    sleep 302 &

Using killall

The killall command terminates processes by their name, rather than their PID. It sends a signal to all processes that match the specified command name.

Let's create a few more sleep processes:

sleep 303 &
sleep 304 &
sleep 305 &

Verify they are running:

jobs
[3] Running    sleep 302 &
[4] Running    sleep 303 &
[5] Running    sleep 304 &
[6] Running    sleep 305 &

Now, use killall to terminate all sleep processes. By default, killall sends SIGTERM.

killall sleep

You will see messages for each terminated sleep process. Verify that all sleep processes are gone:

jobs
(no output)

Using pkill

The pkill command is similar to killall but offers more advanced selection criteria, including pattern matching for command names, user IDs, group IDs, and controlling terminals. It's very powerful for targeting specific sets of processes.

Let's create some new sleep processes for pkill:

sleep 306 &
sleep 307 &
sleep 308 &

Verify they are running:

jobs
[1] Running    sleep 306 &
[2] Running    sleep 307 &
[3] Running    sleep 308 &

To terminate all sleep processes owned by the current user (labex), you can use pkill -u labex sleep:

pkill -u labex sleep

This command will terminate all sleep processes that belong to the labex user.

Verify that all sleep processes are gone:

jobs
(no output)

You can also use pkill with a pattern. For example, if you had processes named my_app_v1 and my_app_v2, you could terminate both with pkill my_app.

These commands provide flexible ways to manage and terminate processes, from targeting a single process by its PID to terminating multiple processes based on their name or other attributes. Always be cautious when using kill -9 or SIGKILL, as it can lead to data loss if the process doesn't have a chance to clean up.

Monitor System Load and CPU Usage with uptime and lscpu

In this step, you will learn how to monitor your system's load average and CPU usage using the uptime and lscpu commands. Understanding these metrics is crucial for assessing system performance and identifying potential bottlenecks.

Understanding Load Average with uptime

The uptime command provides a quick overview of how long your system has been running, how many users are logged in, and most importantly, the system's load average. The load average indicates the average number of processes that are either in a runnable or uninterruptible state over a period of time.

Execute the uptime command:

uptime

You will see output similar to this:

 HH:MM:SS up DD min,  X users,  load average: X.XX, X.XX, X.XX

Let's break down the output:

  • HH:MM:SS: The current time.
  • up DD min: How long the system has been running (uptime).
  • X users: The number of users currently logged in.
  • load average: X.XX, X.XX, X.XX: These three numbers represent the system load average over the last 1, 5, and 15 minutes, respectively.

A load average of 1.00 on a single-core CPU means the CPU is fully utilized. On a multi-core CPU, a load average equal to the number of CPU cores means the system is fully utilized. For example, on a 4-core CPU, a load average of 4.00 indicates full utilization. If the load average consistently exceeds the number of CPU cores, it suggests that your system is overloaded and processes are waiting for CPU time.

Determining CPU Cores with lscpu

To properly interpret the load average, you need to know how many logical CPU cores your system has. The lscpu command provides detailed information about the CPU architecture.

Execute the lscpu command:

lscpu

You will see extensive output. Look for the CPU(s): line, which tells you the total number of logical CPUs available. Also, Core(s) per socket: and Socket(s): can help you understand the physical layout.

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
...output omitted...

In the example above, CPU(s): 4 indicates that this system has 4 logical CPU cores.

Interpreting Load Average and CPU Cores

Let's combine the information from uptime and lscpu. Suppose your uptime output shows a load average of 2.92, 4.48, 5.20 and lscpu shows CPU(s): 4.

To get the per-CPU load average, you divide each load average number by the total number of logical CPUs:

  • Last 1 minute: 2.92 / 4 = 0.73
  • Last 5 minutes: 4.48 / 4 = 1.12
  • Last 15 minutes: 5.20 / 4 = 1.30

Based on these calculations:

  • In the last 1 minute, the CPUs were utilized at about 73% of their capacity.
  • In the last 5 minutes, the system was overloaded by about 12% (1.12 - 1.00 = 0.12). This means processes were waiting for CPU time.
  • In the last 15 minutes, the system was overloaded by about 30% (1.30 - 1.00 = 0.30).

This analysis suggests that the system was under significant load over the last 5 and 15 minutes, but the load has decreased in the last minute. This kind of trend analysis is crucial for understanding system health.

These two commands, uptime and lscpu, are simple yet powerful tools for quickly assessing the overall health and performance of your Linux system.

Analyze Process Activity with top

In this step, you will delve deeper into using the top command to analyze process activity. While top provides a real-time overview, it also offers powerful interactive features to sort, filter, and manage processes.

Recall from a previous step that top provides a dynamic view of your system. Let's start top again:

top

You will see the familiar top interface. The top section provides system-wide statistics, and the lower section lists processes.

Understanding top Columns

Let's review the default columns in the process list:

  • PID: Process ID.
  • USER: The owner of the process.
  • PR: Priority of the process.
  • NI: Nice value of the process (lower nice value means higher priority).
  • VIRT: Virtual memory used by the process.
  • RES: Resident memory (physical RAM) used by the process.
  • SHR: Shared memory used by the process.
  • S: Process state (R=Running, S=Sleeping, D=Uninterruptible Sleep, T=Stopped, Z=Zombie).
  • %CPU: CPU usage percentage since the last update.
  • %MEM: Memory usage percentage (RES / total physical memory).
  • TIME+: Total CPU time used by the process since it started.
  • COMMAND: The command name that started the process.

Interactive Keystrokes in top

top is highly interactive. You can press various keys to change its display and interact with processes.

  1. Sorting Processes:

    • Press Shift+P (capital P) to sort processes by CPU usage (%CPU), which is often the default.
    • Press Shift+M (capital M) to sort processes by memory usage (%MEM).
    • Press Shift+T (capital T) to sort processes by TIME+.

    Try pressing Shift+M now to sort by memory usage. Observe how the process list reorders. Then press Shift+P to return to sorting by CPU.

  2. Filtering by User:

    • Press u (lowercase u). top will prompt you for a username. Type labex and press Enter.
    • Now, top will only display processes owned by the labex user. This is very useful for focusing on your own processes.
    • To clear the filter and show all users again, press u and then press Enter without typing a username.
  3. Changing Update Interval:

    • By default, top updates every 3 seconds. You can change this interval.
    • Press s (lowercase s). top will prompt you for a delay time. Enter 1 (for 1 second) and press Enter.
    • Observe how the display updates more frequently.
    • You can change it back to 3 seconds by pressing s again and entering 3.
  4. Killing a Process:

    • You can terminate a process directly from top.
    • First, let's create a sleep process in the background in a new terminal tab or window, or by pressing Ctrl+Z in your current terminal, then bg to put top in the background, then run sleep 600 &, then fg to bring top back to the foreground.
    • Alternatively, you can open a new terminal tab (e.g., Ctrl+Shift+T in many terminals) and run sleep 600 & there.
    • Once you have a sleep process running, go back to your top terminal.
    • Press k (lowercase k). top will prompt you for the PID of the process to kill.
    • Find the PID of your sleep 600 process in the top list. Enter that PID and press Enter.
    • top will then ask for the signal to send. The default is 15 (SIGTERM). Press Enter to send SIGTERM.
    • The sleep process should disappear from the list. If it doesn't, you can try k again and send signal 9 (SIGKILL).
  5. Renicing a Process:

    • Renicing changes the priority of a process. A lower nice value means higher priority.
    • Press r (lowercase r). top will prompt for a PID and then a nice value (e.g., -10 for higher priority, 10 for lower priority).
    • This is an advanced feature for managing system resources. For this lab, simply press r, then Enter twice to cancel the operation without changing anything.
  6. Quitting top:

    • When you are finished, press q (lowercase q) to quit top and return to your terminal prompt.

top is an indispensable tool for system administrators and users alike. Mastering its interactive features allows for quick and effective diagnosis of system performance issues and process management.

Summary

In this lab, you learned fundamental concepts of Linux process management. You began by understanding process states and their lifecycle using ps and top commands, observing how processes transition between states like Running (R), Interruptible Sleep (S), and Stopped (T). You practiced identifying common process states and interpreting the output of ps aux and ps -ef to gain insights into system processes.

Furthermore, you explored methods for controlling background and foreground jobs, which is crucial for efficient terminal usage. You also mastered terminating processes using various commands such as kill, killall, and pkill, understanding their different applications for graceful or forceful termination. Finally, you learned to monitor system load and CPU usage with uptime and lscpu, and to analyze detailed process activity using top, providing a comprehensive overview of system performance and resource utilization.