Troubleshoot Ansible Playbooks and Hosts on RHEL

AnsibleBeginner
Practice Now

Introduction

In this lab, you will learn how to troubleshoot common issues encountered when working with Ansible on Red Hat Enterprise Linux. You will gain practical experience in identifying and resolving a variety of problems, from initial environment setup to common playbook errors and managed host connectivity issues. The exercises cover fixing YAML syntax, correcting Jinja2 templating mistakes, and diagnosing problems on remote systems.

You will begin by preparing a RHEL environment and configuring Ansible for effective logging. Then, you will dive into hands-on troubleshooting scenarios, using Ansible's check mode to diagnose service-related problems and correcting firewall configurations to resolve host unreachability. By the end of this lab, you will be equipped with a comprehensive set of skills for maintaining robust Ansible automation workflows.

Prepare the RHEL Environment and Configure Ansible Logging

In this step, you will prepare your Red Hat Enterprise Linux environment for Ansible automation. This involves installing the necessary software, creating a dedicated project directory, and setting up a basic configuration file to control Ansible's behavior and enable logging. Proper setup is the first step towards effective automation and troubleshooting.

  1. Install Ansible

    First, you need to install Ansible. The core automation engine is provided by the ansible-core package. Use the dnf package manager with sudo to install it. The -y flag automatically answers "yes" to any confirmation prompts.

    sudo dnf install -y ansible-core

    You should see output indicating that the package is being installed along with its dependencies.

    Last metadata expiration check: ...
    Dependencies resolved.
    ================================================================================
     Package             Architecture   Version                Repository      Size
    ================================================================================
    Installing:
     ansible-core        x86_64         <version>              <repo>          2.8 M
    ...
    Transaction Summary
    ================================================================================
    Install  XX Packages
    
    Total download size: XX M
    Installed size: XX M
    ...
    Complete!
  2. Create a Project Directory

    It's a best practice to organize your Ansible projects in dedicated directories. This keeps your playbooks, inventory, and configuration files neatly separated. Let's create a directory named ansible_troubleshooting inside your home project folder and navigate into it.

    mkdir -p ~/project/ansible_troubleshooting
    cd ~/project/ansible_troubleshooting

    From now on, all commands in this lab will be executed from within the ~/project/ansible_troubleshooting directory.

  3. Create an Ansible Inventory File

    An inventory is a file that lists the hosts (or nodes) that Ansible will manage. Since you are working on a single LabEx VM, you will configure Ansible to manage the local machine itself.

    Create a file named inventory and add localhost to it. The ansible_connection=local part tells Ansible to execute commands directly on the control node (your VM) without using SSH.

    echo "localhost ansible_connection=local" > inventory

    You can verify the content of the file using the cat command:

    cat inventory

    Expected Output:

    localhost ansible_connection=local
  4. Configure Ansible Logging

    An ansible.cfg file allows you to customize Ansible's behavior for a specific project. When placed in the project directory, its settings override the system-wide defaults. Here, you will create this file to specify the location of your inventory and to enable logging. Logging is crucial for troubleshooting, as it records detailed information about every playbook run.

    Use the nano editor to create the ansible.cfg file.

    nano ansible.cfg

    Now, copy and paste the following content into the nano editor. This configuration tells Ansible to use the inventory file in the current directory and to write all log output to a file named ansible.log.

    [defaults]
    inventory = /home/labex/project/ansible_troubleshooting/inventory
    log_path = /home/labex/project/ansible_troubleshooting/ansible.log

    To save the file in nano, press Ctrl+X, then Y to confirm, and finally Enter to write the file.

    Your environment is now fully prepared. You have Ansible installed and a project directory configured with a local inventory and logging enabled, ready for the next steps.

Fix YAML Syntax and Indentation Errors in a Playbook

In this step, you will learn how to diagnose and fix two of the most common types of errors in Ansible playbooks: YAML syntax errors and incorrect indentation. YAML, the language used for writing playbooks, is very strict about its structure. A single misplaced space or an unquoted special character can prevent a playbook from running. You will use the ansible-playbook --syntax-check command, an essential tool for validating your playbooks before execution.

  1. Create a Playbook with Intentional Errors

    First, you will create a new playbook file named webserver.yml in your project directory (~/project/ansible_troubleshooting). This file contains intentional errors that you will fix.

    Use nano to create the file:

    nano webserver.yml

    Copy and paste the following content into the editor. Notice the two deliberate errors: an unquoted string containing a colon and incorrect indentation for the second task.

    ---
    - name: Configure Web Server
      hosts: localhost
      vars:
        ## ERROR 1: Unquoted colon in string
        package_comment: This is a package: httpd
      tasks:
        - name: Install httpd package
          ansible.builtin.dnf:
            name: httpd
            state: present
    
        ## ERROR 2: Incorrect indentation
          - name: Create a test index page
            ansible.builtin.copy:
              content: "<h1>Welcome to Ansible</h1>"
              dest: /var/www/html/index.html

    Save the file and exit nano by pressing Ctrl+X, then Y, and Enter.

  2. Identify and Fix the YAML Syntax Error (Unquoted Colon)

    Now, run a syntax check on the playbook you just created. This command will parse the file and report any syntax issues without actually running the tasks.

    ansible-playbook --syntax-check webserver.yml

    Expected Output (Error): You will see an error because the value for package_comment contains a colon (:) but is not enclosed in quotes. YAML interprets the colon as a key-value separator, leading to a syntax error.

    ERROR! We were unable to read either as JSON nor YAML, these are the errors we found:
    - Syntax Error while loading YAML.
      did not find expected ':'
    
    The error appears to be in '/home/labex/project/ansible_troubleshooting/webserver.yml': line 6, column 41, but may be elsewhere in the file depending on the exact syntax problem.
    
    The offending line appears to be:
    
      vars:
        package_comment: This is a package: httpd
                                            ^ here

    Solution: To fix this, you must enclose the string in double quotes. Open the file again with nano:

    nano webserver.yml

    Modify the line under vars to add quotes:

    ## ... (rest of the file)
    vars:
      ## FIX: Add quotes around the string with a colon
      package_comment: "This is a package: httpd"
    ## ... (rest of the file)

    Save and exit the editor.

  3. Identify and Fix the YAML Indentation Error

    With the first error fixed, run the syntax check again.

    ansible-playbook --syntax-check webserver.yml

    Expected Output (Error): This time, Ansible will report a different error related to the structure of the playbook.

    ERROR! A malformed block was encountered.
    
    The error appears to be in '/home/labex/project/ansible_troubleshooting/webserver.yml': line 13, column 11, but may be elsewhere in the file depending on the exact syntax problem.
    
    The offending line appears to be:
    
    
          ## ERROR 2: Incorrect indentation
          - name: Create a test index page
            ^ here

    This error occurs because YAML uses indentation to define structure. All items in a list (in this case, the tasks, which are list items starting with -) must have the same level of indentation. The second task, Create a test index page, is indented too far.

    Solution: Open the file one more time to correct the indentation.

    nano webserver.yml

    Remove the extra spaces before the second task so that its hyphen (-) aligns perfectly with the hyphen of the first task.

    ## ... (rest of the file)
    tasks:
      - name: Install httpd package
        ansible.builtin.dnf:
          name: httpd
          state: present
    
      ## FIX: Correct the indentation to align with the previous task
      - name: Create a test index page
        ansible.builtin.copy:
          content: "<h1>Welcome to Ansible</h1>"
          dest: /var/www/html/index.html

    Save and exit the editor.

  4. Verify the Corrected Playbook

    Finally, run the syntax check one last time.

    ansible-playbook --syntax-check webserver.yml

    This time, the command should complete without any errors, and you will see the playbook's name printed, confirming that the syntax is now correct.

    Expected Output (Success):

    playbook: webserver.yml

Resolve Jinja2 Quoting and Template Path Errors

In this step, you will tackle errors related to Jinja2, Ansible's powerful templating engine. You'll learn why Jinja2 expressions often need to be quoted and how to debug issues when a playbook cannot find a specified template file. These are common runtime errors that occur after a playbook has already passed a syntax check.

  1. Create a Jinja2 Template File

    First, you need a template file. Unlike a static file, a template can contain variables that Ansible will replace with actual values during playbook execution. You will create a simple HTML template.

    Use nano to create a file named index.html.j2 in your project directory (~/project/ansible_troubleshooting). The .j2 extension is a common convention for Jinja2 templates.

    nano index.html.j2

    Copy and paste the following HTML content into the editor. Note the {{ welcome_message }} placeholder, which is a Jinja2 variable.

    <h1>{{ welcome_message }}</h1>
    <p>This page was deployed by Ansible.</p>

    Save the file and exit nano (Ctrl+X, Y, Enter).

  2. Modify the Playbook to Use the Template and Introduce Errors

    Now, modify your webserver.yml playbook to use the ansible.builtin.template module. You will also introduce two new errors: an unquoted Jinja2 variable and an incorrect template path.

    Open webserver.yml with nano:

    nano webserver.yml

    Replace the entire content of the file with the following. The become: true directive tells Ansible to execute tasks with administrative privileges (using sudo), which is necessary to install software and write files to system directories like /var/www/html.

    ---
    - name: Configure Web Server
      hosts: localhost
      become: true
      vars:
        package_name: httpd
        welcome_message: "Welcome to Ansible with Jinja2"
      tasks:
        - name: Install httpd package
          ansible.builtin.dnf:
            ## ERROR 1: Unquoted Jinja2 variable
            name: { { package_name } }
            state: present
    
        - name: Create a test index page from template
          ansible.builtin.template:
            ## ERROR 2: Incorrect template source path
            src: index.j2
            dest: /var/www/html/index.html

    Save and exit the editor.

  3. Identify and Fix the Jinja2 Quoting Error

    Even though this is a Jinja2 issue, it can manifest as a YAML syntax error. Run the syntax checker to see how Ansible interprets it.

    ansible-playbook --syntax-check webserver.yml

    Expected Output (Error): You will get a syntax error because a YAML value starting with {{ is treated as a special construct and must be quoted to be interpreted as a string.

    ERROR! A malformed block was encountered.
    
    The error appears to be in '/home/labex/project/ansible_troubleshooting/webserver.yml': line 11, column 19, but may be elsewhere in the file depending on the exact syntax problem.
    
    The offending line appears to be:
    
              ## ERROR 1: Unquoted Jinja2 variable
              name: {{ package_name }}
                      ^ here

    Solution: Open webserver.yml and enclose the Jinja2 variable in double quotes.

    nano webserver.yml

    Modify the Install httpd package task:

    ## ... (rest of the file)
    tasks:
      - name: Install httpd package
        ansible.builtin.dnf:
          ## FIX: Quote the Jinja2 expression
          name: "{{ package_name }}"
          state: present
    ## ... (rest of the file)

    Save and exit. The syntax check should now pass.

  4. Identify and Fix the Template Path Error

    Now that the syntax is correct, try to run the playbook.

    ansible-playbook webserver.yml

    Expected Output (Error): The playbook will fail, but this time it's a runtime error, not a syntax error. The error message clearly states that the source file index.j2 could not be found.

    TASK [Create a test index page from template] **********************************
    fatal: [localhost]: FAILED! => {"changed": false, "msg": "Could not find or access '/home/labex/project/ansible_troubleshooting/index.j2' on the Ansible Controller."}

    This happens because the src parameter in your playbook points to index.j2, but the file you created is named index.html.j2.

    Solution: Open webserver.yml one last time and correct the filename.

    nano webserver.yml

    Modify the src parameter in the Create a test index page from template task:

    ## ... (rest of the file)
    - name: Create a test index page from template
      ansible.builtin.template:
        ## FIX: Correct template source filename
        src: index.html.j2
        dest: /var/www/html/index.html
    ## ... (rest of the file)

    Save and exit the editor.

  5. Run the Playbook Successfully

    Run the playbook again. It should now complete all tasks successfully.

    ansible-playbook webserver.yml

    Expected Output (Success):

    PLAY [Configure Web Server] ****************************************************
    
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]
    
    TASK [Install httpd package] ***************************************************
    changed: [localhost]
    
    TASK [Create a test index page from template] **********************************
    changed: [localhost]
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=3    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Use Check Mode to Troubleshoot Managed Host Service Errors

In this step, you will learn to use one of Ansible's most powerful troubleshooting features: Check Mode. Check mode (activated with the --check flag) allows you to run a playbook to see what changes would be made, without actually modifying anything on the system. This is incredibly useful for safely testing playbooks and diagnosing issues, such as incorrect service names, before they cause real problems.

  1. Create a Playbook to Manage a Service

    You will now create a new playbook, service.yml, designed to ensure the httpd web server service is running. However, you will intentionally use an incorrect service name to simulate a common error.

    Use nano to create the service.yml file in your ~/project/ansible_troubleshooting directory.

    nano service.yml

    Copy and paste the following content. Note that the service name is set to apache2, which is a common name for the Apache web server on other Linux distributions but is incorrect for RHEL.

    ---
    - name: Manage Web Server Service
      hosts: localhost
      become: true
      tasks:
        - name: Ensure web server service is started
          ansible.builtin.service:
            ## ERROR: Incorrect service name for RHEL
            name: apache2
            state: started
            enabled: true

    Save the file and exit nano (Ctrl+X, Y, Enter).

  2. Use Check Mode to Identify the Service Error

    Instead of running the playbook normally, execute it in check mode. This will prevent Ansible from making any changes but will allow it to check the state of the system and report what it would do.

    ansible-playbook --check service.yml

    Expected Output (Error): The playbook will fail. The error message will clearly indicate that it could not find a service named apache2. This immediately tells you that the name parameter in your playbook is wrong.

    TASK [Ensure web server service is started] ************************************
    fatal: [localhost]: FAILED! => {"changed": false, "msg": "Could not find the requested service 'apache2': host"}
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
  3. Find the Correct Service Name

    To fix the playbook, you need to find the correct service name for the httpd package on RHEL. A reliable way to do this is to list the files installed by the package and look for the service unit file, which typically resides in /usr/lib/systemd/system/.

    Use the rpm command to query the httpd package:

    rpm -ql httpd | grep systemd

    Expected Output: This command will list the systemd-related files, including the service file.

    /usr/lib/systemd/system/httpd.service
    /usr/lib/systemd/system/httpd@.service
    ...

    The output httpd.service tells you that the correct service name is httpd.

  4. Correct the Playbook and Re-run in Check Mode

    Now that you know the correct service name, edit the service.yml file.

    nano service.yml

    Change the service name from apache2 to httpd.

    ## ... (rest of the file)
    - name: Ensure web server service is started
      ansible.builtin.service:
        ## FIX: Correct service name for RHEL
        name: httpd
        state: started
        enabled: true

    Save and exit the editor. Now, run the playbook in check mode again.

    ansible-playbook --check service.yml

    Expected Output (Success in Check Mode): This time, the playbook should report a changed status. In check mode, changed means "a change would have been made if this were a real run." It indicates that your playbook logic is now correct and Ansible has identified that the httpd service needs to be started.

    TASK [Ensure web server service is started] ************************************
    changed: [localhost]
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

    Note: In this specific container-based lab environment, a full systemd init system is not running. While check mode works correctly, a normal run of the ansible.builtin.service module might still encounter issues. The key lesson here is using check mode to validate your playbook's logic against the system's configuration.

Correct Firewall Configuration and Host Unreachability Issues

In this final step, you will address two critical runtime issues: failures caused by incorrect system configurations, such as the firewall, and connectivity problems resulting from errors in your Ansible inventory file. Mastering these will help you resolve some of the most common roadblocks in automation.

Part 1: Correcting Firewall Configuration

A common task in server configuration is opening ports in the firewall. A playbook can easily fail if it refers to a firewall service that doesn't exist on the target system.

  1. Install and Prepare firewalld

    First, ensure the firewalld package is installed, as it provides the firewall management service on RHEL.

    sudo dnf install -y firewalld

    Start the firewalld service.

    sudo systemctl start firewalld

    You also need to install the ansible.posix collection, which contains the firewalld module used in this exercise.

    ansible-galaxy collection install ansible.posix

    Note: You may see a warning about Ansible version compatibility, but the collection will still function correctly for this exercise.

  2. Create a Playbook with a Firewall Error

    Create a new playbook named firewall.yml that attempts to enable the http service. However, you will intentionally use an incorrect service name, web, to trigger an error.

    nano firewall.yml

    Copy and paste the following content into the editor:

    ---
    - name: Configure System Firewall
      hosts: localhost
      become: true
      tasks:
        - name: Allow web traffic through firewall
          ansible.posix.firewalld:
            ## ERROR: 'web' is not a standard firewalld service
            service: web
            permanent: true
            state: enabled

    Save and exit nano (Ctrl+X, Y, Enter).

  3. Run the Playbook and Diagnose the Failure

    Execute the playbook. It will fail because firewalld does not recognize a service named web.

    ansible-playbook firewall.yml

    Expected Output (Error): The error message clearly states that web is not a supported service, pointing you directly to the problem.

    TASK [Allow web traffic through firewall] **************************************
    fatal: [localhost]: FAILED! => {"changed": false, "msg": "web is not a supported service. This is what I have."}
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
  4. Find the Correct Firewall Service Name

    To find the list of valid, predefined service names, you can use the firewall-cmd command-line tool.

    firewall-cmd --get-services

    Expected Output: You will see a long list of available services. Look through the list to find the correct one for web traffic, which is http.

    RH-Satellite-6 ... ftp http https imaps ipp ipp-client ...
  5. Correct the Playbook and Run Successfully

    Edit firewall.yml and replace the incorrect service name web with the correct one, http.

    nano firewall.yml

    The corrected task should look like this:

    ## ... (rest of the file)
    - name: Allow web traffic through firewall
      ansible.posix.firewalld:
        ## FIX: Use the correct firewalld service name
        service: http
        permanent: true
        state: enabled

    Save and exit. Now, run the playbook again. It should complete successfully.

    ansible-playbook firewall.yml

Part 2: Troubleshooting Host Unreachability

An "unreachable" error means Ansible cannot connect to a host listed in your inventory. This is often caused by a simple typo in the hostname.

  1. Simulate an Unreachable Host

    Intentionally introduce a typo into your inventory file and remove the local connection setting. This will force Ansible to attempt an actual network connection to the misspelled hostname.

    nano inventory

    Change localhost to localhossst and remove ansible_connection=local.

    ## ERROR: Intentional typo in hostname, no local connection
    localhossst

    Save and exit the editor.

  2. Modify the Playbook to Use Inventory Hosts

    First, you need to modify the webserver.yml playbook to use the inventory hosts instead of the hardcoded localhost. When a playbook uses hosts: localhost, Ansible treats it as a special case and bypasses the inventory file entirely.

    nano webserver.yml

    Change the hosts line from localhost to all:

    ---
    - name: Configure Web Server
      hosts: all ## Changed from 'localhost' to use inventory hosts
      become: true
      ## ... rest of the playbook remains the same

    Save and exit the editor.

  3. Run the Playbook to Trigger the Error

    Now try to run the modified playbook. It will fail because the inventory contains the typo localhossst.

    ansible-playbook webserver.yml

    Expected Output (Error): Ansible will fail and report the host as UNREACHABLE. The error message indicates that the hostname could not be resolved.

    PLAY [Configure Web Server] ****************************************************
    
    TASK [Gathering Facts] **********************************************************
    fatal: [localhossst]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname localhossst: Name or service not known", "unreachable": true}
    
    PLAY RECAP *********************************************************************
    localhossst                : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0
  4. Correct the Inventory File

    The UNREACHABLE status is your cue to double-check hostnames and network connectivity. In this case, the fix is to correct the typo in the inventory file.

    nano inventory

    Change localhossst back to localhost.

    ## FIX: Corrected hostname
    localhost ansible_connection=local

    Save and exit. Rerunning ansible-playbook webserver.yml will now succeed.

  5. Optional: Restore the Original Playbook

    If you want to restore the playbook to use hosts: localhost for future exercises, you can change it back:

    nano webserver.yml

    Change the hosts line back to localhost:

    ---
    - name: Configure Web Server
      hosts: localhost ## Restored to original
      become: true
      ## ... rest of the playbook

    Save and exit. This step demonstrates the difference between using hardcoded localhost (which bypasses inventory) versus using inventory-defined hosts.

Summary

In this lab, you prepared a Red Hat Enterprise Linux environment for Ansible by installing ansible-core and configuring logging, then proceeded to troubleshoot a variety of common issues. You learned to diagnose and resolve errors within playbooks, such as fixing incorrect YAML syntax, indentation, Jinja2 quoting, and invalid template paths. These skills are fundamental to writing valid and reliable automation code.

Furthermore, you addressed problems related to the managed host environment. You utilized Ansible's check mode to safely perform a dry run and identify potential service failures on a target node without making actual changes. The lab concluded by tackling connectivity problems, where you corrected firewall configurations to resolve host unreachability, providing a comprehensive approach to debugging from the control node to the managed host.