Troubleshoot the RHEL Boot Process

Introduction

In this lab, you will learn essential techniques for troubleshooting and repairing the Red Hat Enterprise Linux (RHEL) boot process. You will explore how to interact with the system at different stages of the boot sequence to diagnose and resolve common problems that can prevent a system from starting correctly. This includes working with systemd boot targets and utilizing specialized boot modes designed for system recovery.

Throughout the exercises, you will gain hands-on experience managing systemd targets with systemctl, booting into rescue mode from the GRUB menu for system maintenance, and resetting the root password using the rd.break kernel parameter. Additionally, you will learn how to use emergency mode to repair critical configuration file errors, such as a corrupted /etc/fstab, ensuring you can restore a non-bootable system to an operational state.

Manage Systemd Boot Targets with systemctl

In this step, you will learn how to manage systemd boot targets. In systemd, a "target" is a synchronization point that groups together various services and other units required to bring the system to a certain state. This is the modern equivalent of "runlevels" in older SysV init systems. We will explore how to view the current default target, change the default target for future boots, and temporarily switch to a different target.

First, move to the practice directory for this lab.

cd /home/labex/project

First, let's check which target your system boots into by default. The graphical.target is used for systems with a desktop environment, providing a graphical user interface (GUI). The multi-user.target is for a command-line only interface.

To see the current default target and save the result for later review, run the following command:

systemctl get-default | tee step1-initial-target.txt

You should see that the default target is the graphical target.

graphical.target

Now, let's change the default boot target to multi-user.target. This is useful for server environments or for troubleshooting situations where the graphical interface is not needed or is causing issues. The systemctl set-default command achieves this by changing the /etc/systemd/system/default.target symbolic link.

Use sudo to execute this command with administrative privileges.

sudo systemctl set-default multi-user.target

The output confirms that the symbolic link has been updated.

Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target -> /usr/lib/systemd/system/multi-user.target.

You can verify that the default has been changed by running the get-default command again and saving the result.

systemctl get-default | tee step1-multi-user-target.txt

The output now shows the new default target.

multi-user.target

With this setting, the system would boot into a text-based console after a reboot. For this lab, we want to maintain a consistent graphical environment. Let's set the default target back to graphical.target.

sudo systemctl set-default graphical.target

You will see a similar output as before, indicating the symlink has been changed back.

Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target -> /usr/lib/systemd/system/graphical.target.

Run a final check to confirm the default target is restored to graphical.target, and save the result.

systemctl get-default | tee step1-final-target.txt

graphical.target

In addition to changing the default target for reboots, you can also switch targets in the current session using systemctl isolate. This command stops services not associated with the new target and starts the ones that are. For example, running sudo systemctl isolate multi-user.target would terminate your graphical session and switch to a text-only console. This is a powerful but potentially disruptive command, so we will not execute it here.

You have now successfully used systemctl to manage systemd targets.

In this step, you will learn about rescue.target, a special systemd target designed for system recovery. On a standard RHEL system, you would access this mode by rebooting, interrupting the boot loader (GRUB), and adding a parameter to the kernel's boot options. This provides a single-user shell with the root filesystem mounted and most services disabled, which is ideal for troubleshooting.

While we cannot perform a real reboot or access the GRUB menu in this containerized lab environment, we can still explore the configuration of rescue mode to understand how it works.

First, let's locate the systemd unit file for rescue.target. These files are typically stored in the /usr/lib/systemd/system/ directory.

ls -l /usr/lib/systemd/system/rescue.target

You will see the file listed with its permissions and ownership.

-rw-r--r--. 1 root root 500 Nov  1  2022 /usr/lib/systemd/system/rescue.target

Now, let's examine the contents of this file to understand its configuration. The cat command will display the file's content in the terminal, and tee will save a copy in your practice directory.

cat /usr/lib/systemd/system/rescue.target | tee /home/labex/project/rescue-target.txt

The output shows the definition of the target.

##  SPDX-License-Identifier: LGPL-2.1-or-later
#
##  This file is part of systemd.
#
##  systemd is free software; you can redistribute it and/or modify it
##  under the terms of the GNU Lesser General Public License as published by
##  the Free Software Foundation; either version 2.1 of the License, or
##  (at your option) any later version.

[Unit]
Description=Rescue Mode
Documentation=man:systemd.special(7)
Requires=sysinit.target rescue.service
After=sysinit.target rescue.service
AllowIsolate=yes

Key directives in this file include:

Description=Rescue Mode: A human-readable name for the target.
Requires=sysinit.target rescue.service: This ensures that both sysinit.target (basic system initialization) and rescue.service are started when this target is activated. The rescue service provides the root maintenance shell.
After=sysinit.target rescue.service: This specifies the order of activation, ensuring rescue mode starts after system initialization and the rescue service.
AllowIsolate=yes: This allows you to switch to this target from another target using the systemctl isolate rescue.target command in a running system.

To get a better idea of the minimal environment that rescue mode provides, you can view its dependencies. The systemctl list-dependencies command shows all the units that are started as part of a target. Save that output as well.

systemctl list-dependencies rescue.target | tee /home/labex/project/rescue-dependencies.txt

The output lists the units required for rescue mode. You'll see a minimal set of services, confirming that it's a streamlined environment designed for repair tasks.

rescue.target
○ ├─rescue.service
○ ├─systemd-update-utmp-runlevel.service
● └─sysinit.target
●   ├─dev-hugepages.mount
●   ├─dev-mqueue.mount
●   ├─dracut-shutdown.service
○   ├─iscsi-onboot.service
○   ├─iscsi-starter.service
●   ├─kmod-static-nodes.service
●   ├─ldconfig.service
●   ├─lvm2-lvmpolld.socket
... (output may vary) ...

The key takeaway is that rescue.target provides a root shell with the filesystem mounted read-write, enabling you to fix system issues. In the following steps, we will simulate recovery scenarios that rely on similar principles.

Reset the Root Password using rd.break and chroot

In this step, you will learn the procedure for resetting a lost root password on a RHEL system. This is a critical recovery skill. The standard method involves interrupting the boot process with the rd.break kernel parameter, which gives you access to a shell before the system fully starts.

On a physical or virtual machine, you would reboot, interrupt the GRUB boot loader, and add rd.break to the end of the linux kernel line. This action stops the boot process just before systemd takes control, placing you in an initramfs shell. From there, the general steps are:

Remount the system's root filesystem (which is mounted read-only at /sysroot) in read-write mode with the command mount -o remount,rw /sysroot.
Enter a chroot jail at /sysroot with chroot /sysroot. This makes the system's actual root filesystem your current environment, allowing you to run commands that affect the system.
Change the password using the passwd command.
Address potential SELinux context issues.
Exit the chroot and the initramfs shell to continue booting.

While we cannot perform a real reboot and use rd.break in this lab environment, we will simulate the most important commands you would execute after entering the chroot environment.

First, let's simulate changing the root password. Imagine you have successfully entered the chroot jail. You would now have root access to change any user's password. We will use the sudo passwd root command to change the root user's password. When prompted, set the new password to redhat.

sudo passwd root

You will be prompted to enter and re-enter the new password. Set it to redhat.

Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

After changing the password in this recovery environment, the SELinux security context on the password file (/etc/shadow) can become incorrect. To fix this, you must force a full system SELinux relabel on the next boot. This is done by creating an empty file named .autorelabel in the root (/) directory.

sudo touch /.autorelabel

Let's verify that the file was created.

ls -l /.autorelabel

The output should show the newly created file.

-rw-r--r--. 1 root root 0 <date> <time> /.autorelabel

On a real system, you would now type exit twice and let the system reboot. It would perform the lengthy relabeling process and then boot normally with the new password. Leave the /.autorelabel file in place for verification in this lab; the verification step will remove it automatically after checking.

This concludes the simulation of resetting the root password. You have practiced the key commands (passwd and touch /.autorelabel) that are central to the recovery process.

Repair /etc/fstab Errors using Emergency Mode

In this step, you will learn how to diagnose and repair errors in the /etc/fstab file. This file is critical for the boot process as it tells the system which filesystems to mount and where. An incorrect entry in /etc/fstab can prevent the system from booting, forcing it into emergency mode.

Emergency mode provides the most minimal environment possible for system repair. Unlike rescue mode, it does not attempt to mount most filesystems or start many services. Crucially, the root filesystem (/) is mounted in read-only (ro) mode to prevent further damage.

While we cannot trigger a real boot failure in this lab, we can simulate the process of finding and fixing an /etc/fstab error.

First, let's intentionally add a faulty entry to /etc/fstab. We will use the echo command with sudo to append a line that references a non-existent device.

echo '/dev/nonexistent /data xfs defaults 0 0' | sudo tee -a /etc/fstab

Now, let's view the contents of /etc/fstab to confirm our bad line was added, and save a copy before you repair it.

cat /etc/fstab | tee /home/labex/project/step4-fstab-before.txt

You should see the incorrect line at the end of the file.

#
## /etc/fstab
## Created by anaconda on <date>
#
## Accessible filesystems, by reference, are maintained under '/dev/disk/'.
## See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
## After editing this file, run 'systemctl daemon-reload' to update systemd
## units generated from this file.
#
/dev/vda4 / xfs defaults 0 0
/dev/vda2 /boot xfs defaults 0 0
/dev/vda1 /boot/efi vfat umask=0077,shortname=winnt 0 0
/dev/vda3 swap swap defaults 0 0
/dev/nonexistent /data xfs defaults 0 0

Next, we will simulate the diagnostic step. The mount -a command attempts to mount all filesystems listed in /etc/fstab that are not already mounted. Since our entry is invalid, this command will fail. Save the error output so you can compare it with the repaired state later.

sudo mount -a 2>&1 | tee /home/labex/project/step4-mount-error.txt

The command will produce an error, clearly indicating that the bad /dev/nonexistent entry cannot be mounted. This is similar to the type of error you would see during a failed boot.

mount: /data: fsconfig system call failed: /dev/nonexistent: Can't lookup blockdev.
       dmesg(1) may have more information after failed mount system call.

Now, let's simulate the repair process. In a real emergency shell, the first step is to remount the root filesystem in read-write mode to allow for changes.

sudo mount -o remount,rw /

With the filesystem now writable, you can edit /etc/fstab to fix the error. Use nano, vi, or another terminal editor you are comfortable with to open the file.

sudo vi /etc/fstab

Inside the editor, navigate to the faulty line (/dev/nonexistent /data xfs defaults 0 0) and delete it, then save the file.

To confirm the fix, run sudo mount -a again.

sudo mount -a

This time, the command should execute silently with no output, which indicates that all valid entries in /etc/fstab are correctly mounted. You have successfully repaired the file.

Summary

In this lab, you learned essential techniques for troubleshooting the Red Hat Enterprise Linux boot process. You practiced managing systemd boot targets, such as viewing the current default and changing it between graphical.target and multi-user.target using systemctl. You also learned how to interrupt the boot sequence to access specialized recovery environments, including booting into rescue mode from the GRUB menu to perform system maintenance tasks in a single-user shell.

Furthermore, you executed critical recovery procedures for common system failures. You successfully reset a forgotten root password by using the rd.break kernel parameter, remounting the root filesystem with write permissions, and using a chroot environment to set a new password, while also addressing SELinux context by creating an .autorelabel file. Lastly, you learned to resolve boot failures caused by /etc/fstab errors by entering emergency mode, identifying the problematic entry, and commenting it out to allow the system to boot successfully.