Troubleshoot the RHEL Boot Process

Red Hat Enterprise LinuxBeginner
Practice Now

Introduction

In this lab, you will learn essential techniques for troubleshooting and repairing the Red Hat Enterprise Linux (RHEL) boot process. You will explore how to interact with the system at different stages of the boot sequence to diagnose and resolve common problems that can prevent a system from starting correctly. This includes working with systemd boot targets and utilizing specialized boot modes designed for system recovery.

Throughout the exercises, you will gain hands-on experience managing systemd targets with systemctl, booting into rescue mode from the GRUB menu for system maintenance, and resetting the root password using the rd.break kernel parameter. Additionally, you will learn how to use emergency mode to repair critical configuration file errors, such as a corrupted /etc/fstab, ensuring you can restore a non-bootable system to an operational state.

Manage Systemd Boot Targets with systemctl

In this step, you will learn how to manage systemd boot targets. In systemd, a "target" is a synchronization point that groups together various services and other units required to bring the system to a certain state. This is the modern equivalent of "runlevels" in older SysV init systems. We will explore how to view the current default target, change the default target for future boots, and temporarily switch to a different target.

First, let's check which target your system boots into by default. The graphical.target is used for systems with a desktop environment, providing a graphical user interface (GUI). The multi-user.target is for a command-line only interface.

To see the current default target, run the following command:

systemctl get-default

You should see that the default target is the graphical target.

graphical.target

Now, let's change the default boot target to multi-user.target. This is useful for server environments or for troubleshooting situations where the graphical interface is not needed or is causing issues. The systemctl set-default command achieves this by changing the /etc/systemd/system/default.target symbolic link.

Use sudo to execute this command with administrative privileges.

sudo systemctl set-default multi-user.target

The output confirms that the symbolic link has been updated.

Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target -> /usr/lib/systemd/system/multi-user.target.

You can verify that the default has been changed by running the get-default command again.

systemctl get-default

The output now shows the new default target.

multi-user.target

With this setting, the system would boot into a text-based console after a reboot. For this lab, we want to maintain a consistent graphical environment. Let's set the default target back to graphical.target.

sudo systemctl set-default graphical.target

You will see a similar output as before, indicating the symlink has been changed back.

Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target -> /usr/lib/systemd/system/graphical.target.

Run a final check to confirm the default target is restored to graphical.target.

systemctl get-default
graphical.target

In addition to changing the default target for reboots, you can also switch targets in the current session using systemctl isolate. This command stops services not associated with the new target and starts the ones that are. For example, running sudo systemctl isolate multi-user.target would terminate your graphical session and switch to a text-only console. This is a powerful but potentially disruptive command, so we will not execute it here.

You have now successfully used systemctl to manage systemd targets.

In this step, you will learn about rescue.target, a special systemd target designed for system recovery. On a standard RHEL system, you would access this mode by rebooting, interrupting the boot loader (GRUB), and adding a parameter to the kernel's boot options. This provides a single-user shell with the root filesystem mounted and most services disabled, which is ideal for troubleshooting.

While we cannot perform a real reboot or access the GRUB menu in this containerized lab environment, we can still explore the configuration of rescue mode to understand how it works.

First, let's locate the systemd unit file for rescue.target. These files are typically stored in the /usr/lib/systemd/system/ directory.

ls -l /usr/lib/systemd/system/rescue.target

You will see the file listed with its permissions and ownership.

-rw-r--r--. 1 root root 500 Nov  1  2022 /usr/lib/systemd/system/rescue.target

Now, let's examine the contents of this file to understand its configuration. The cat command will display the file's content in the terminal.

cat /usr/lib/systemd/system/rescue.target

The output shows the definition of the target.

##  SPDX-License-Identifier: LGPL-2.1-or-later
#
##  This file is part of systemd.
#
##  systemd is free software; you can redistribute it and/or modify it
##  under the terms of the GNU Lesser General Public License as published by
##  the Free Software Foundation; either version 2.1 of the License, or
##  (at your option) any later version.

[Unit]
Description=Rescue Mode
Documentation=man:systemd.special(7)
Requires=sysinit.target rescue.service
After=sysinit.target rescue.service
AllowIsolate=yes

Key directives in this file include:

  • Description=Rescue Mode: A human-readable name for the target.
  • Requires=sysinit.target rescue.service: This ensures that both sysinit.target (basic system initialization) and rescue.service are started when this target is activated. The rescue service provides the root maintenance shell.
  • After=sysinit.target rescue.service: This specifies the order of activation, ensuring rescue mode starts after system initialization and the rescue service.
  • AllowIsolate=yes: This allows you to switch to this target from another target using the systemctl isolate rescue.target command in a running system.

To get a better idea of the minimal environment that rescue mode provides, you can view its dependencies. The systemctl list-dependencies command shows all the units that are started as part of a target.

systemctl list-dependencies rescue.target

The output lists the units required for rescue mode. You'll see a minimal set of services, confirming that it's a streamlined environment designed for repair tasks.

rescue.target
○ ├─rescue.service
○ ├─systemd-update-utmp-runlevel.service
● └─sysinit.target
●   ├─dev-hugepages.mount
●   ├─dev-mqueue.mount
●   ├─dracut-shutdown.service
○   ├─iscsi-onboot.service
○   ├─iscsi-starter.service
●   ├─kmod-static-nodes.service
●   ├─ldconfig.service
●   ├─lvm2-lvmpolld.socket
... (output may vary) ...

The key takeaway is that rescue.target provides a root shell with the filesystem mounted read-write, enabling you to fix system issues. In the following steps, we will simulate recovery scenarios that rely on similar principles.

Reset the Root Password using rd.break and chroot

In this step, you will learn the procedure for resetting a lost root password on a RHEL system. This is a critical recovery skill. The standard method involves interrupting the boot process with the rd.break kernel parameter, which gives you access to a shell before the system fully starts.

On a physical or virtual machine, you would reboot, interrupt the GRUB boot loader, and add rd.break to the end of the linux kernel line. This action stops the boot process just before systemd takes control, placing you in an initramfs shell. From there, the general steps are:

  1. Remount the system's root filesystem (which is mounted read-only at /sysroot) in read-write mode with the command mount -o remount,rw /sysroot.
  2. Enter a chroot jail at /sysroot with chroot /sysroot. This makes the system's actual root filesystem your current environment, allowing you to run commands that affect the system.
  3. Change the password using the passwd command.
  4. Address potential SELinux context issues.
  5. Exit the chroot and the initramfs shell to continue booting.

While we cannot perform a real reboot and use rd.break in this lab environment, we will simulate the most important commands you would execute after entering the chroot environment.

First, let's simulate changing the root password. Imagine you have successfully entered the chroot jail. You would now have root access to change any user's password. We will use the sudo passwd root command to change the root user's password. When prompted, set the new password to redhat.

sudo passwd root

You will be prompted to enter and re-enter the new password (e.g. labex.io).

Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

After changing the password in this recovery environment, the SELinux security context on the password file (/etc/shadow) can become incorrect. To fix this, you must force a full system SELinux relabel on the next boot. This is done by creating an empty file named .autorelabel in the root (/) directory.

sudo touch /.autorelabel

Let's verify that the file was created.

ls -l /.autorelabel

The output should show the newly created file.

-rw-r--r--. 1 root root 0 <date> <time> /.autorelabel

On a real system, you would now type exit twice and let the system reboot. It would perform the lengthy relabeling process and then boot normally with the new password. Since we don't want to trigger this in our lab, we will clean up by removing the file we just created.

sudo rm /.autorelabel

This concludes the simulation of resetting the root password. You have practiced the key commands (passwd and touch /.autorelabel) that are central to the recovery process.

Repair /etc/fstab Errors using Emergency Mode

In this step, you will learn how to diagnose and repair errors in the /etc/fstab file. This file is critical for the boot process as it tells the system which filesystems to mount and where. An incorrect entry in /etc/fstab can prevent the system from booting, forcing it into emergency mode.

Emergency mode provides the most minimal environment possible for system repair. Unlike rescue mode, it does not attempt to mount most filesystems or start many services. Crucially, the root filesystem (/) is mounted in read-only (ro) mode to prevent further damage.

While we cannot trigger a real boot failure in this lab, we can simulate the process of finding and fixing an /etc/fstab error.

First, let's intentionally add a faulty entry to /etc/fstab. We will use the echo command with sudo to append a line that references a non-existent device.

echo '/dev/nonexistent /data xfs defaults 0 0' | sudo tee -a /etc/fstab

Now, let's view the contents of /etc/fstab to confirm our bad line was added.

cat /etc/fstab

You should see the incorrect line at the end of the file.

#
## /etc/fstab
## Created by anaconda on <date>
#
## Accessible filesystems, by reference, are maintained under '/dev/disk/'.
## See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
## After editing this file, run 'systemctl daemon-reload' to update systemd
## units generated from this file.
#
/dev/vda4 / xfs defaults 0 0
/dev/vda2 /boot xfs defaults 0 0
/dev/vda1 /boot/efi vfat umask=0077,shortname=winnt 0 0
/dev/vda3 swap swap defaults 0 0
/dev/nonexistent /data xfs defaults 0 0

Next, we will simulate the diagnostic step. The mount -a command attempts to mount all filesystems listed in /etc/fstab that are not already mounted. Since our entry is invalid, this command will fail.

sudo mount -a

The command will produce an error, clearly indicating that the mount point /data does not exist. This is similar to the error you would see during a failed boot.

mount: /data: mount point does not exist.

Now, let's simulate the repair process. In a real emergency shell, the first step is to remount the root filesystem in read-write mode to allow for changes.

sudo mount -o remount,rw /

With the filesystem now writable, you can edit /etc/fstab to fix the error. Use the nano editor to open the file.

sudo nano /etc/fstab

Inside the nano editor, use the arrow keys to navigate to the faulty line (/dev/nonexistent /data xfs defaults 0 0) and delete it. You can delete the entire line by pressing Ctrl+k. Once the line is removed, save the file by pressing Ctrl+x, then y, and finally Enter.

To confirm the fix, run sudo mount -a again.

sudo mount -a

This time, the command should execute silently with no output, which indicates that all valid entries in /etc/fstab are correctly mounted. You have successfully repaired the file.

Summary

In this lab, you learned essential techniques for troubleshooting the Red Hat Enterprise Linux boot process. You practiced managing systemd boot targets, such as viewing the current default and changing it between graphical.target and multi-user.target using systemctl. You also learned how to interrupt the boot sequence to access specialized recovery environments, including booting into rescue mode from the GRUB menu to perform system maintenance tasks in a single-user shell.

Furthermore, you executed critical recovery procedures for common system failures. You successfully reset a forgotten root password by using the rd.break kernel parameter, remounting the root filesystem with write permissions, and using a chroot environment to set a new password, while also addressing SELinux context by creating an .autorelabel file. Lastly, you learned to resolve boot failures caused by /etc/fstab errors by entering emergency mode, identifying the problematic entry, and commenting it out to allow the system to boot successfully.