Troubleshooting Common Ansible Issues
What are the first steps you take when an Ansible playbook fails?
Answer:
First, I examine the error message in the console output. Then, I check the Ansible logs if available, and verify connectivity to the target host using ansible -m ping all. Finally, I ensure the inventory file is correct and accessible.
How do you debug a playbook that seems to hang or run indefinitely?
Answer:
I'd first check for network connectivity issues or firewall blocks. Then, I'd use ansible-playbook -vvv for verbose output to pinpoint where it's hanging. Sometimes, a task might be waiting for user input or a long-running process without a timeout.
A task fails with 'unreachable'. What are the common causes and how do you troubleshoot it?
Answer:
Common causes include incorrect IP/hostname, firewall blocking SSH port (22), SSH service not running, or incorrect SSH credentials. I'd verify network reachability with ping, check firewall rules, and test SSH connectivity manually from the control node.
How do you handle 'Permission denied' errors when running Ansible playbooks?
Answer:
This usually indicates incorrect SSH keys, wrong user, or insufficient sudo privileges on the target host. I'd verify the SSH key path and permissions, ensure the ansible_user is correct, and check if become: yes is used where root privileges are needed, along with proper sudoers configuration.
Explain how ansible-playbook --syntax-check and ansible-playbook --check help in troubleshooting.
Answer:
--syntax-check validates the YAML syntax of the playbook, catching parsing errors before execution. --check (or dry run) executes the playbook without making any changes on the remote hosts, showing what would happen, which is useful for identifying logical errors or unexpected state changes.
What is the purpose of ansible-playbook -vvv and when would you use it?
Answer:
ansible-playbook -vvv increases the verbosity level, providing detailed output including module arguments, return values, and SSH connection details. I use it when a playbook fails without a clear error message, or when I need to understand the exact execution flow of a task.
Answer:
First, I'd check if gather_facts: true is set in the playbook. Then, I'd ensure Python is installed on the target host, as Ansible facts collection relies on it. Network issues or firewall rules blocking fact collection ports can also be a cause.
A playbook runs successfully but doesn't achieve the desired state. How do you debug this?
Answer:
This suggests a logical error in the playbook. I'd use ansible-playbook -vvv to inspect module parameters and their actual values. I'd also manually verify the state on the target host after execution and consider using debug modules to print variables at different stages.
What if a task fails only on a subset of hosts in your inventory?
Answer:
This points to host-specific issues. I'd isolate one of the failing hosts and manually test connectivity and permissions. I'd also check for differences in OS versions, installed packages, or configuration on the failing hosts compared to the successful ones.
How can you use the debug module for troubleshooting?
Answer:
The debug module allows printing variables, messages, or the output of previous tasks to the console. I use it to inspect the value of variables, check the return status of commands, or confirm conditional logic during playbook execution, like: - debug: var=my_variable.
You encounter a 'No such file or directory' error for a file that exists on the control node. What could be wrong?
Answer:
This often happens when using the copy or template module. It usually means the source path specified in the playbook is incorrect or relative to the wrong directory on the control node. Verify the absolute path or the path relative to the playbook's location.