Ansible 脚本输出捕获指南：全面解析与实践

介绍

Ansible 是一个被广泛使用的 IT 自动化工具，它简化了复杂基础设施和应用程序部署的管理。在本教程中，我们将探讨如何捕获通过 Ansible playbook 执行的脚本的输出。这项功能对于监控、调试和分析你的自动化任务的结果至关重要。通过完成这个实验（Lab），你将理解在你的 Ansible 工作流程中捕获和利用脚本输出的各种技术。

设置 Ansible 环境

在我们开始使用 Ansible 捕获脚本输出之前，我们需要设置一个基本的 Ansible 环境。这包括创建必要的目录结构和配置文件。

理解 Ansible 基础知识

Ansible 通过连接到目标主机并推送称为模块的小程序来工作。这些模块在目标主机上执行，并在完成后被删除。Ansible 是无代理的（agent-less），这意味着你不需要在被管理节点上安装任何特殊的软件。

让我们从创建一个项目目录和必要的 Ansible 文件开始：

mkdir -p ~/project/ansible-output-demo/scripts
cd ~/project/ansible-output-demo

现在，让我们创建一个简单的 inventory 文件。在 Ansible 中，inventory 文件定义了 playbook 中的命令、模块和任务在其上运行的主机和主机组。

使用代码编辑器创建一个 inventory 文件：

点击 IDE 左上角的“File”菜单
选择“New File”
将其保存为 inventory，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 inventory 文件中：

[local]
localhost ansible_connection=local

这个 inventory 文件指定我们将在本地机器上运行 Ansible。

接下来，让我们创建一个简单的脚本，它将生成一些输出供我们捕获。这个脚本将：

打印一些系统信息
生成一些标准输出
生成一些标准错误输出

在 scripts 目录中创建一个名为 info.sh 的新文件：

点击“File”菜单
选择“New File”
将其保存为 scripts/info.sh，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 info.sh 文件中：

#!/bin/bash

## Print system information
echo "=== System Information ==="
echo "Hostname: $(hostname)"
echo "Date: $(date)"
echo "Kernel: $(uname -r)"
echo "Memory:"
free -h

## Generate some standard output
echo "=== Standard Output ==="
echo "This is standard output"
echo "Hello from the script!"

## Generate some standard error
echo "=== Standard Error ===" >&2
echo "This is standard error" >&2
echo "An example error message" >&2

## Exit with a specific code
exit 0

现在，让我们使脚本可执行：

chmod +x ~/project/ansible-output-demo/scripts/info.sh

让我们直接运行脚本，看看它产生了什么输出：

~/project/ansible-output-demo/scripts/info.sh

你应该看到包含系统信息、标准输出消息和标准错误消息的输出。

现在我们已经设置好了基本环境。在下一步中，我们将创建一个 Ansible playbook 来执行此脚本并捕获其输出。

使用 Ansible 进行基本输出捕获

现在我们已经设置好了环境，让我们创建一个简单的 Ansible playbook，它执行我们的脚本并捕获其输出。

创建一个基本的 Playbook

在 Ansible 中，playbook 是 YAML 文件，它定义了一组要在远程主机上执行的任务。让我们创建一个 playbook，它使用 register 关键字运行我们的 info.sh 脚本并捕获其输出。

在 ~/project/ansible-output-demo 目录中创建一个名为 capture_output.yml 的新文件：

点击“File”菜单
选择“New File”
将其保存为 capture_output.yml，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 capture_output.yml 文件中：

---
- name: Capture Script Output
  hosts: local
  gather_facts: no

  tasks:
    - name: Execute the info.sh script
      command: "{{ playbook_dir }}/scripts/info.sh"
      register: script_output

    - name: Display the script output
      debug:
        var: script_output.stdout

让我们检查一下这个 playbook：

playbook 针对我们在 inventory 中定义的 local 组。
第一个任务使用 command 模块执行我们的 info.sh 脚本。
register 关键字将命令的输出存储在一个名为 script_output 的变量中。
第二个任务使用 debug 模块显示脚本的标准输出（stdout）。

运行 Playbook

现在让我们运行 playbook，看看它是如何捕获和显示脚本输出的：

cd ~/project/ansible-output-demo
ansible-playbook -i inventory capture_output.yml

你应该看到类似于以下的输出：

PLAY [Capture Script Output] *******************************************

TASK [Execute the info.sh script] **************************************
changed: [localhost]

TASK [Display the script output] ***************************************
ok: [localhost] => {
    "script_output.stdout": "=== System Information ===\nHostname: ubuntu\nDate: Tue Oct 17 12:34:56 UTC 2023\nKernel: 5.15.0-1031-aws\nMemory:\n              total        used        free      shared  buff/cache   available\nMem:          7.7Gi       1.2Gi       5.2Gi        12Mi       1.3Gi       6.3Gi\nSwap:            0B          0B          0B\n=== Standard Output ===\nThis is standard output\nHello from the script!"
}

PLAY RECAP ************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

请注意，只显示了标准输出。标准错误（stderr）输出没有显示，因为我们只要求显示 script_output.stdout。

改进输出可读性

输出作为单个字符串有点难以阅读。让我们修改我们的 playbook，使用 stdout_lines 属性以更易读的格式显示输出，该属性将输出呈现为行列表。

编辑 capture_output.yml 文件并按如下方式修改第二个任务：

- name: Display the script output
  debug:
    var: script_output.stdout_lines

再次运行 playbook：

ansible-playbook -i inventory capture_output.yml

现在输出应该更具可读性，每行单独显示：

TASK [Display the script output] ***************************************
ok: [localhost] => {
    "script_output.stdout_lines": [
        "=== System Information ===",
        "Hostname: ubuntu",
        "Date: Tue Oct 17 12:34:56 UTC 2023",
        "Kernel: 5.15.0-1031-aws",
        "Memory:",
        "              total        used        free      shared  buff/cache   available",
        "Mem:          7.7Gi       1.2Gi       5.2Gi        12Mi       1.3Gi       6.3Gi",
        "Swap:            0B          0B          0B",
        "=== Standard Output ===",
        "This is standard output",
        "Hello from the script!"
    ]
}

这种格式使输出更容易阅读和使用。在下一步中，我们将探讨如何捕获和显示不同类型的输出。

捕获不同类型的输出

在上一步中，我们捕获并显示了脚本的标准输出。但是，在执行脚本时，我们可能需要捕获几种类型的输出：

标准输出（stdout）：脚本的正常输出
标准错误（stderr）：错误消息和警告
返回码（rc）：脚本的退出状态（0 通常表示成功，非零值表示错误）

让我们创建一个新的 playbook，它捕获并显示所有三种类型的输出。

在 ~/project/ansible-output-demo 目录中创建一个名为 capture_all_output.yml 的新文件：

点击“File”菜单
选择“New File”
将其保存为 capture_all_output.yml，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 capture_all_output.yml 文件中：

---
- name: Capture All Types of Script Output
  hosts: local
  gather_facts: no

  tasks:
    - name: Execute the info.sh script
      command: "{{ playbook_dir }}/scripts/info.sh"
      register: script_output

    - name: Display standard output
      debug:
        msg: "Standard Output (stdout):"

    - name: Display stdout content
      debug:
        var: script_output.stdout_lines

    - name: Display standard error
      debug:
        msg: "Standard Error (stderr):"

    - name: Display stderr content
      debug:
        var: script_output.stderr_lines

    - name: Display return code
      debug:
        msg: "Return Code: {{ script_output.rc }}"

这个 playbook 执行我们的脚本，然后显示：

使用 script_output.stdout_lines 的标准输出
使用 script_output.stderr_lines 的标准错误
使用 script_output.rc 的返回码

运行增强的 Playbook

让我们运行我们的新 playbook：

cd ~/project/ansible-output-demo
ansible-playbook -i inventory capture_all_output.yml

你应该看到所有三种类型的输出的综合显示：

PLAY [Capture All Types of Script Output] *****************************

TASK [Execute the info.sh script] *************************************
changed: [localhost]

TASK [Display standard output] ****************************************
ok: [localhost] => {
    "msg": "Standard Output (stdout):"
}

TASK [Display stdout content] *****************************************
ok: [localhost] => {
    "script_output.stdout_lines": [
        "=== System Information ===",
        "Hostname: ubuntu",
        "Date: Tue Oct 17 12:40:22 UTC 2023",
        "Kernel: 5.15.0-1031-aws",
        "Memory:",
        "              total        used        free      shared  buff/cache   available",
        "Mem:          7.7Gi       1.2Gi       5.2Gi        12Mi       1.3Gi       6.3Gi",
        "Swap:            0B          0B          0B",
        "=== Standard Output ===",
        "This is standard output",
        "Hello from the script!"
    ]
}

TASK [Display standard error] *****************************************
ok: [localhost] => {
    "msg": "Standard Error (stderr):"
}

TASK [Display stderr content] *****************************************
ok: [localhost] => {
    "script_output.stderr_lines": [
        "=== Standard Error ===",
        "This is standard error",
        "An example error message"
    ]
}

TASK [Display return code] ********************************************
ok: [localhost] => {
    "msg": "Return Code: 0"
}

PLAY RECAP **********************************************************
localhost                  : ok=6    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

现在我们可以看到脚本的所有类型的输出：

标准输出显示系统信息和我们的常规消息
标准错误显示我们的错误消息
返回码为 0，表示执行成功

创建一个带有错误的脚本

让我们创建一个将产生错误并返回非零退出代码的脚本，以查看 Ansible 如何处理它。

在 scripts 目录中创建一个名为 error.sh 的新文件：

点击“File”菜单
选择“New File”
将其保存为 scripts/error.sh，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 error.sh 文件中：

#!/bin/bash

## Print some standard output
echo "Starting error demonstration script"
echo "This will appear in stdout"

## Print some standard error
echo "This will appear in stderr" >&2
echo "Error: Something went wrong!" >&2

## Exit with a non-zero code to indicate error
exit 1

使脚本可执行：

chmod +x ~/project/ansible-output-demo/scripts/error.sh

现在让我们创建一个 playbook 来执行此脚本并处理错误。创建一个名为 handle_errors.yml 的新文件：

点击“File”菜单
选择“New File”
将其保存为 handle_errors.yml，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 handle_errors.yml 文件中：

---
- name: Handle Script Errors
  hosts: local
  gather_facts: no

  tasks:
    - name: Execute the error script
      command: "{{ playbook_dir }}/scripts/error.sh"
      register: script_output
      ignore_errors: yes

    - name: Display standard output
      debug:
        var: script_output.stdout_lines

    - name: Display standard error
      debug:
        var: script_output.stderr_lines

    - name: Display return code
      debug:
        msg: "Return Code: {{ script_output.rc }}"

    - name: Check if script failed
      debug:
        msg: "The script failed with return code {{ script_output.rc }}"
      when: script_output.rc != 0

注意 ignore_errors: yes 的添加，它告诉 Ansible 即使命令失败（返回非零退出代码）也要继续运行 playbook。

让我们运行这个 playbook：

ansible-playbook -i inventory handle_errors.yml

你应该看到类似于以下的输出：

PLAY [Handle Script Errors] *******************************************

TASK [Execute the error script] ***************************************
changed: [localhost]

TASK [Display standard output] ****************************************
ok: [localhost] => {
    "script_output.stdout_lines": [
        "Starting error demonstration script",
        "This will appear in stdout"
    ]
}

TASK [Display standard error] *****************************************
ok: [localhost] => {
    "script_output.stderr_lines": [
        "This will appear in stderr",
        "Error: Something went wrong!"
    ]
}

TASK [Display return code] ********************************************
ok: [localhost] => {
    "msg": "Return Code: 1"
}

TASK [Check if script failed] *****************************************
ok: [localhost] => {
    "msg": "The script failed with return code 1"
}

PLAY RECAP **********************************************************
localhost                  : ok=5    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

此示例演示了如何：

捕获产生错误的脚本的输出
尽管出现错误，仍继续 playbook 执行
根据脚本的返回码有条件地执行任务

在下一步中，我们将探索更多高级用例和使用 Ansible 处理脚本输出的最佳实践。

高级输出处理和实际用例

现在我们了解了如何捕获不同类型的输出，让我们探索一些更高级的技术，用于在 Ansible 中处理和利用脚本输出。

使用过滤器解析输出

Ansible 提供了各种过滤器，允许你操作和提取脚本输出中的特定信息。在本节中，我们将研究一些常见的过滤技术。

在 ~/project/ansible-output-demo 目录中创建一个名为 parse_output.yml 的新文件：

点击“File”菜单
选择“New File”
将其保存为 parse_output.yml，位于 ~/project/ansible-output-demo 目录中

首先，让我们创建一个生成一些可供我们解析的结构化输出的脚本。创建一个名为 system_stats.sh 的新文件：

点击“File”菜单
选择“New File”
将其保存为 scripts/system_stats.sh，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 system_stats.sh 文件中：

#!/bin/bash

## Display CPU info
echo "CPU_MODEL: $(grep 'model name' /proc/cpuinfo | head -1 | cut -d ':' -f2 | xargs)"
echo "CPU_CORES: $(grep -c 'processor' /proc/cpuinfo)"

## Display memory info in GB
mem_total=$(free -g | grep Mem | awk '{print $2}')
echo "MEMORY_GB: $mem_total"

## Display disk usage
disk_usage=$(df -h / | tail -1 | awk '{print $5}' | tr -d '%')
echo "DISK_USAGE_PCT: $disk_usage"

## Display load average
load_avg=$(uptime | awk -F'load average: ' '{print $2}' | cut -d, -f1)
echo "LOAD_AVG: $load_avg"

exit 0

使脚本可执行：

chmod +x ~/project/ansible-output-demo/scripts/system_stats.sh

现在让我们创建一个 playbook，它执行此脚本，捕获其输出，并解析它以提取特定信息：

将以下内容添加到 parse_output.yml 文件中：

---
- name: Parse Script Output
  hosts: local
  gather_facts: no

  tasks:
    - name: Execute the system_stats.sh script
      command: "{{ playbook_dir }}/scripts/system_stats.sh"
      register: stats_output

    - name: Display raw output
      debug:
        var: stats_output.stdout_lines

    - name: Parse CPU model
      set_fact:
        cpu_model: "{{ stats_output.stdout | regex_search('CPU_MODEL: (.+)', '\\1') | first }}"

    - name: Parse CPU cores
      set_fact:
        cpu_cores: "{{ stats_output.stdout | regex_search('CPU_CORES: (\\d+)', '\\1') | first }}"

    - name: Parse memory
      set_fact:
        memory_gb: "{{ stats_output.stdout | regex_search('MEMORY_GB: (\\d+)', '\\1') | first }}"

    - name: Parse disk usage
      set_fact:
        disk_usage: "{{ stats_output.stdout | regex_search('DISK_USAGE_PCT: (\\d+)', '\\1') | first }}"

    - name: Parse load average
      set_fact:
        load_avg: "{{ stats_output.stdout | regex_search('LOAD_AVG: ([0-9.]+)', '\\1') | first }}"

    - name: Display parsed information
      debug:
        msg: |
          Parsed system information:
          - CPU Model: {{ cpu_model }}
          - CPU Cores: {{ cpu_cores }}
          - Memory (GB): {{ memory_gb }}
          - Disk Usage (%): {{ disk_usage }}
          - Load Average: {{ load_avg }}

这个 playbook：

执行我们的 system_stats.sh 脚本
显示原始输出
使用 regex_search 过滤器从输出中提取特定信息
将提取的信息存储在变量中
以结构化格式显示解析后的信息

让我们运行这个 playbook：

cd ~/project/ansible-output-demo
ansible-playbook -i inventory parse_output.yml

你应该看到类似于以下的输出：

PLAY [Parse Script Output] ********************************************

TASK [Execute the system_stats.sh script] *****************************
changed: [localhost]

TASK [Display raw output] *********************************************
ok: [localhost] => {
    "stats_output.stdout_lines": [
        "CPU_MODEL: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz",
        "CPU_CORES: 2",
        "MEMORY_GB: 7",
        "DISK_USAGE_PCT: 58",
        "LOAD_AVG: 0.08"
    ]
}

TASK [Parse CPU model] ************************************************
ok: [localhost]

TASK [Parse CPU cores] ************************************************
ok: [localhost]

TASK [Parse memory] ***************************************************
ok: [localhost]

TASK [Parse disk usage] ***********************************************
ok: [localhost]

TASK [Parse load average] *********************************************
ok: [localhost]

TASK [Display parsed information] *************************************
ok: [localhost] => {
    "msg": "Parsed system information:\n- CPU Model: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz\n- CPU Cores: 2\n- Memory (GB): 7\n- Disk Usage (%): 58\n- Load Average: 0.08"
}

PLAY RECAP ***********************************************************
localhost                  : ok=8    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

根据输出做出决策

捕获脚本输出最强大的方面之一是使用它在你的 Ansible playbooks 中做出决策。让我们创建一个 playbook，它演示基于脚本输出的条件执行。

在 ~/project/ansible-output-demo 目录中创建一个名为 conditional_actions.yml 的新文件：

点击“File”菜单
选择“New File”
将其保存为 conditional_actions.yml，位于 ~/project/ansible-output-demo 目录中

添加以下内容：

---
- name: Conditional Actions Based on Script Output
  hosts: local
  gather_facts: no

  tasks:
    - name: Execute the system_stats.sh script
      command: "{{ playbook_dir }}/scripts/system_stats.sh"
      register: stats_output

    - name: Parse disk usage
      set_fact:
        disk_usage: "{{ stats_output.stdout | regex_search('DISK_USAGE_PCT: (\\d+)', '\\1') | first | int }}"

    - name: Parse load average
      set_fact:
        load_avg: "{{ stats_output.stdout | regex_search('LOAD_AVG: ([0-9.]+)', '\\1') | first | float }}"

    - name: Display system status
      debug:
        msg: "Current system status: Disk usage: {{ disk_usage }}%, Load average: {{ load_avg }}"

    - name: Warn about high disk usage
      debug:
        msg: "WARNING: Disk usage is high at {{ disk_usage }}%. Consider cleaning up disk space."
      when: disk_usage > 50

    - name: Warn about high load average
      debug:
        msg: "WARNING: Load average is high at {{ load_avg }}. Check for resource-intensive processes."
      when: load_avg > 1.0

    - name: Report healthy system
      debug:
        msg: "System is healthy. All metrics within normal ranges."
      when: disk_usage <= 50 and load_avg <= 1.0

这个 playbook：

执行 system_stats.sh 脚本
解析磁盘使用率和负载平均值
根据这些值显示不同的消息：
- 如果磁盘使用率超过 50%，则发出警告
- 如果负载平均值超过 1.0，则发出警告
- 如果所有指标都在正常范围内，则显示“系统健康”消息

让我们运行这个 playbook：

ansible-playbook -i inventory conditional_actions.yml

输出将取决于你系统的当前状态，但它应该包括基于你的磁盘使用率和负载平均值的条件消息。

这些示例演示了你如何：

解析并从脚本输出中提取特定信息
使用该信息在你的 Ansible playbooks 中做出决策
根据脚本输出采取不同的操作

这些技术对于创建能够适应不同条件和场景的动态、响应式自动化工作流程至关重要。

实际用例——系统健康检查

在最后一步中，我们将创建一个完整的实际示例，它将我们所学到的关于使用 Ansible 捕获和处理脚本输出的所有内容结合起来。我们将构建一个系统健康检查工具，该工具将：

收集各种系统指标
分析指标以识别潜在问题
生成健康报告
在必要时采取补救措施

创建健康检查脚本

首先，让我们创建一个全面的健康检查脚本，该脚本收集各种系统指标。

在 scripts 目录中创建一个名为 health_check.sh 的新文件：

点击“File”菜单
选择“New File”
将其保存为 scripts/health_check.sh，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 health_check.sh 文件中：

#!/bin/bash

## System Health Check Script

echo "HEALTH_CHECK_START: $(date)"

## CPU load
cpu_load=$(uptime | awk -F'load average: ' '{print $2}' | cut -d, -f1)
echo "CPU_LOAD: $cpu_load"
if (($(echo "$cpu_load > 1.0" | bc -l))); then
  echo "CPU_STATUS: WARNING"
else
  echo "CPU_STATUS: OK"
fi

## Memory usage
mem_total=$(free | grep Mem | awk '{print $2}')
mem_used=$(free | grep Mem | awk '{print $3}')
mem_pct=$(echo "scale=2; $mem_used / $mem_total * 100" | bc)
echo "MEM_USAGE_PCT: $mem_pct"
if (($(echo "$mem_pct > 80" | bc -l))); then
  echo "MEM_STATUS: WARNING"
else
  echo "MEM_STATUS: OK"
fi

## Disk usage
disk_usage=$(df -h / | tail -1 | awk '{print $5}' | tr -d '%')
echo "DISK_USAGE_PCT: $disk_usage"
if [ "$disk_usage" -gt 80 ]; then
  echo "DISK_STATUS: WARNING"
else
  echo "DISK_STATUS: OK"
fi

## Check for zombie processes
zombie_count=$(ps aux | grep -c Z)
echo "ZOMBIE_PROCESSES: $zombie_count"
if [ "$zombie_count" -gt 0 ]; then
  echo "ZOMBIE_STATUS: WARNING"
else
  echo "ZOMBIE_STATUS: OK"
fi

## Check system uptime
uptime_days=$(uptime | awk '{print $3}')
echo "UPTIME_DAYS: $uptime_days"

## Check last 5 log entries for errors
echo "RECENT_ERRORS: $(grep -i error /var/log/syslog 2> /dev/null | tail -5 | wc -l)"

echo "HEALTH_CHECK_END: $(date)"

使脚本可执行：

chmod +x ~/project/ansible-output-demo/scripts/health_check.sh

创建健康检查 Playbook

现在，让我们创建一个全面的 playbook，它执行健康检查脚本，分析结果，并根据结果采取适当的措施。

在 ~/project/ansible-output-demo 目录中创建一个名为 system_health_check.yml 的新文件：

点击“File”菜单
选择“New File”
将其保存为 system_health_check.yml，位于 ~/project/ansible-output-demo 目录中

将以下内容添加到 system_health_check.yml 文件中：

---
- name: System Health Check
  hosts: local
  gather_facts: no

  tasks:
    - name: Execute health check script
      command: "{{ playbook_dir }}/scripts/health_check.sh"
      register: health_check

    - name: Display raw health check output
      debug:
        var: health_check.stdout_lines

    ## Parse metrics from the health check output
    - name: Parse CPU load
      set_fact:
        cpu_load: "{{ health_check.stdout | regex_search('CPU_LOAD: ([0-9.]+)', '\\1') | first | float }}"
        cpu_status: "{{ health_check.stdout | regex_search('CPU_STATUS: (\\w+)', '\\1') | first }}"

    - name: Parse memory usage
      set_fact:
        mem_usage: "{{ health_check.stdout | regex_search('MEM_USAGE_PCT: ([0-9.]+)', '\\1') | first | float }}"
        mem_status: "{{ health_check.stdout | regex_search('MEM_STATUS: (\\w+)', '\\1') | first }}"

    - name: Parse disk usage
      set_fact:
        disk_usage: "{{ health_check.stdout | regex_search('DISK_USAGE_PCT: (\\d+)', '\\1') | first | int }}"
        disk_status: "{{ health_check.stdout | regex_search('DISK_STATUS: (\\w+)', '\\1') | first }}"

    - name: Parse zombie processes
      set_fact:
        zombie_count: "{{ health_check.stdout | regex_search('ZOMBIE_PROCESSES: (\\d+)', '\\1') | first | int }}"
        zombie_status: "{{ health_check.stdout | regex_search('ZOMBIE_STATUS: (\\w+)', '\\1') | first }}"

    ## Generate a health status summary
    - name: Generate health status summary
      set_fact:
        health_status:
          cpu:
            load: "{{ cpu_load }}"
            status: "{{ cpu_status }}"
          memory:
            usage_percent: "{{ mem_usage }}"
            status: "{{ mem_status }}"
          disk:
            usage_percent: "{{ disk_usage }}"
            status: "{{ disk_status }}"
          processes:
            zombie_count: "{{ zombie_count }}"
            status: "{{ zombie_status }}"

    ## Display health summary
    - name: Display health summary
      debug:
        var: health_status

    ## Check overall system status
    - name: Determine overall system status
      set_fact:
        overall_status: "{{ 'WARNING' if (cpu_status == 'WARNING' or mem_status == 'WARNING' or disk_status == 'WARNING' or zombie_status == 'WARNING') else 'OK' }}"

    - name: Display overall system status
      debug:
        msg: "Overall System Status: {{ overall_status }}"

    ## Take remedial actions if necessary
    - name: Recommend actions for CPU issues
      debug:
        msg: "Action recommended: Check for resource-intensive processes using 'top' or 'htop'"
      when: cpu_status == "WARNING"

    - name: Recommend actions for memory issues
      debug:
        msg: "Action recommended: Free up memory by restarting services or clearing cache"
      when: mem_status == "WARNING"

    - name: Recommend actions for disk issues
      debug:
        msg: "Action recommended: Clean up disk space using 'du -sh /* | sort -hr' to identify large directories"
      when: disk_status == "WARNING"

    - name: Recommend actions for zombie processes
      debug:
        msg: "Action recommended: Identify and restart parent processes of zombies"
      when: zombie_status == "WARNING"

    ## Generate health report file
    - name: Create health report directory
      file:
        path: "{{ playbook_dir }}/reports"
        state: directory

    - name: Get current timestamp
      command: date "+%Y%m%d_%H%M%S"
      register: timestamp

    - name: Create health report file
      copy:
        content: |
          System Health Report - {{ timestamp.stdout }}
          ----------------------------------------------

          CPU:
            Load Average: {{ cpu_load }}
            Status: {{ cpu_status }}

          Memory:
            Usage: {{ mem_usage }}%
            Status: {{ mem_status }}

          Disk:
            Usage: {{ disk_usage }}%
            Status: {{ disk_status }}

          Processes:
            Zombie Count: {{ zombie_count }}
            Status: {{ zombie_status }}

          Overall Status: {{ overall_status }}

          Recommendations:
          {% if cpu_status == "WARNING" %}
          - CPU: Check for resource-intensive processes using 'top' or 'htop'
          {% endif %}
          {% if mem_status == "WARNING" %}
          - Memory: Free up memory by restarting services or clearing cache
          {% endif %}
          {% if disk_status == "WARNING" %}
          - Disk: Clean up disk space using 'du -sh /* | sort -hr' to identify large directories
          {% endif %}
          {% if zombie_status == "WARNING" %}
          - Processes: Identify and restart parent processes of zombies
          {% endif %}
          {% if overall_status == "OK" %}
          - System is healthy. No actions required.
          {% endif %}
        dest: "{{ playbook_dir }}/reports/health_report_{{ timestamp.stdout }}.txt"

    - name: Display report location
      debug:
        msg: "Health report saved to {{ playbook_dir }}/reports/health_report_{{ timestamp.stdout }}.txt"

这个全面的 playbook：

执行我们的健康检查脚本
从脚本输出中解析各种指标
创建系统健康状况的结构化摘要
根据各个组件的状态确定整体系统状态
为检测到的任何问题提供具体建议
生成带有时间戳的详细健康报告文件

运行健康检查

让我们运行我们的系统健康检查 playbook：

cd ~/project/ansible-output-demo
ansible-playbook -i inventory system_health_check.yml

你应该看到一个详细的输出，显示系统的健康状况，以及任何必要的改进建议。输出将根据你系统的当前状态而有所不同。

运行 playbook 后，检查 reports 目录以查看生成的健康报告：

ls -l ~/project/ansible-output-demo/reports/

你应该看到一个名为 health_report_[timestamp].txt 的文件。查看此文件的内容：

cat ~/project/ansible-output-demo/reports/health_report_*.txt

我们所学内容的总结

在本教程中，我们学到了：

如何从 Ansible 执行的脚本中捕获不同类型的输出（stdout、stderr、返回码）
如何使用 Ansible 过滤器解析和提取脚本输出中的特定信息
如何使用脚本输出做出决策并执行条件操作
如何实现一个完整的实际解决方案，该方案利用脚本输出来进行系统健康监控

这些技术是你 Ansible 自动化工具包中的强大工具，使你能够创建复杂的、动态的、响应式的自动化工作流程。

总结

在这个实验中，我们探讨了如何有效地捕获和利用通过 Ansible 执行的脚本的输出。我们从使用 register 关键字的基本输出捕获开始，并逐步发展到更高级的技术，例如使用过滤器解析输出以及根据脚本结果做出决策。

本教程的主要收获包括：

能够从 Ansible 执行的脚本中捕获不同类型的输出（stdout、stderr、返回码）
用于从脚本输出中解析和提取特定信息的技巧
根据脚本输出有条件地执行任务的方法
一个全面的实际示例，演示了如何使用 Ansible 构建系统健康监控解决方案

通过掌握这些技术，你可以创建更复杂、更动态、响应更快的自动化工作流程，这些工作流程可以适应不同的条件和场景。这种能力对于使用 Ansible 进行有效的基础设施管理、应用程序部署和系统管理至关重要。

在你继续你的 Ansible 之旅时，请记住，脚本输出捕获只是 Ansible 提供的众多强大功能之一。探索其他 Ansible 功能，例如角色、模板和 vault，将进一步增强你的自动化工具包。

如何捕获 Ansible 执行的脚本输出

介绍

设置 Ansible 环境

理解 Ansible 基础知识

使用 Ansible 进行基本输出捕获

创建一个基本的 Playbook

运行 Playbook

改进输出可读性

捕获不同类型的输出

运行增强的 Playbook

创建一个带有错误的脚本

高级输出处理和实际用例

使用过滤器解析输出

根据输出做出决策

实际用例——系统健康检查

创建健康检查脚本

创建健康检查 Playbook

运行健康检查

我们所学内容的总结

总结