绘制置信椭圆 | Python Matplotlib 教程

简介

本实验将演示如何使用 Python Matplotlib 绘制二维数据集的置信椭圆。置信椭圆是数据集协方差的图形表示，显示了估计均值和标准差的不确定性。椭圆是使用皮尔逊相关系数绘制的。

虚拟机使用提示

虚拟机启动完成后，点击左上角切换到“笔记本”标签，以访问 Jupyter Notebook 进行练习。

有时，你可能需要等待几秒钟让 Jupyter Notebook 完成加载。由于 Jupyter Notebook 的限制，操作验证无法自动化。

如果你在学习过程中遇到问题，随时向 Labby 提问。课程结束后提供反馈，我们将立即为你解决问题。

导入所需库

第一步是导入必要的库。本实验我们将需要 numpy 和 matplotlib.pyplot。

import matplotlib.pyplot as plt
import numpy as np

定义 `confidence_ellipse` 函数

接下来，我们定义 confidence_ellipse 函数，该函数将接受数据集的 x 和 y 坐标、用于绘制椭圆的坐标轴对象以及标准差的数量。它返回一个表示椭圆的 Matplotlib 补丁对象。

def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
    """
    创建 *x* 和 *y* 的协方差置信椭圆的绘图。

    参数
    ----------
    x, y : 类似数组，形状 (n, )
        输入数据。

    ax : matplotlib.axes.Axes
        要在其中绘制椭圆的坐标轴对象。

    n_std : 浮点数
        用于确定椭圆半径的标准差数量。

    **kwargs
        转发给 `~matplotlib.patches.Ellipse`

    返回
    -------
    matplotlib.patches.Ellipse
    """
    if x.size!= y.size:
        raise ValueError("x 和 y 必须大小相同")

    cov = np.cov(x, y)
    pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
    ## 使用特殊情况获取此二维数据集的特征值。
    ell_radius_x = np.sqrt(1 + pearson)
    ell_radius_y = np.sqrt(1 - pearson)
    ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2,
                      facecolor=facecolor, **kwargs)

    ## 根据方差的平方根计算 x 的标准差，并乘以给定的标准差数量。
    scale_x = np.sqrt(cov[0, 0]) * n_std
    mean_x = np.mean(x)

    ## 计算 y 的标准差...
    scale_y = np.sqrt(cov[1, 1]) * n_std
    mean_y = np.mean(y)

    transf = transforms.Affine2D() \
     .rotate_deg(45) \
     .scale(scale_x, scale_y) \
     .translate(mean_x, mean_y)

    ellipse.set_transform(transf + ax.transData)
    return ax.add_patch(ellipse)

定义 `get_correlated_dataset` 函数

我们还需要一个函数来生成具有指定均值、维度和相关性的二维数据集。

def get_correlated_dataset(n, dependency, mu, scale):
    """
    创建一个具有指定二维均值 (mu) 和维度 (scale) 的随机二维数据集。
    相关性可以通过参数 'dependency'（一个 2x2 矩阵）来控制。
    """
    latent = np.random.randn(n, 2)
    dependent = latent.dot(dependency)
    scaled = dependent * scale
    scaled_with_offset = scaled + mu
    ## 返回新的、相关数据集的 x 和 y
    return scaled_with_offset[:, 0], scaled_with_offset[:, 1]

绘制正相关、负相关和弱相关

现在，我们可以使用这些函数来绘制具有正相关、负相关和弱相关的数据集的置信椭圆。

np.random.seed(0)

PARAMETERS = {
    '正相关': [[0.85, 0.35],
               [0.15, -0.65]],
    '负相关': [[0.9, -0.4],
               [0.1, -0.6]],
    '弱相关': [[1, 0],
               [0, 1]],
}

mu = 2, 4
scale = 3, 5

fig, axs = plt.subplots(1, 3, figsize=(9, 3))
for ax, (title, dependency) in zip(axs, PARAMETERS.items()):
    x, y = get_correlated_dataset(800, dependency, mu, scale)
    ax.scatter(x, y, s=0.5)

    ax.axvline(c='grey', lw=1)
    ax.axhline(c='grey', lw=1)

    confidence_ellipse(x, y, ax, edgecolor='red')

    ax.scatter(mu[0], mu[1], c='red', s=3)
    ax.set_title(title)

plt.show()

绘制不同标准差数量的情况

我们还可以绘制具有不同标准差数量的置信椭圆。

fig, ax_nstd = plt.subplots(figsize=(6, 6))

dependency_nstd = [[0.8, 0.75],
                   [-0.2, 0.35]]
mu = 0, 0
scale = 8, 5

ax_nstd.axvline(c='grey', lw=1)
ax_nstd.axhline(c='grey', lw=1)

x, y = get_correlated_dataset(500, dependency_nstd, mu, scale)
ax_nstd.scatter(x, y, s=0.5)

confidence_ellipse(x, y, ax_nstd, n_std=1,
                   label=r'$1\sigma$', edgecolor='firebrick')
confidence_ellipse(x, y, ax_nstd, n_std=2,
                   label=r'$2\sigma$', edgecolor='fuchsia', linestyle='--')
confidence_ellipse(x, y, ax_nstd, n_std=3,
                   label=r'$3\sigma$', edgecolor='blue', linestyle=':')

ax_nstd.scatter(mu[0], mu[1], c='red', s=3)
ax_nstd.set_title('不同标准差')
ax_nstd.legend()
plt.show()

使用关键字参数

最后，我们可以使用关键字参数来自定义椭圆的外观。

fig, ax_kwargs = plt.subplots(figsize=(6, 6))
dependency_kwargs = [[-0.8, 0.5],
                     [-0.2, 0.5]]
mu = 2, -3
scale = 6, 5

ax_kwargs.axvline(c='grey', lw=1)
ax_kwargs.axhline(c='grey', lw=1)

x, y = get_correlated_dataset(500, dependency_kwargs, mu, scale)
## 绘制 zorder=0 的椭圆，以展示其透明度（由使用 alpha 引起）。
confidence_ellipse(x, y, ax_kwargs,
                   alpha=0.5, facecolor='pink', edgecolor='purple', zorder=0)

ax_kwargs.scatter(x, y, s=0.5)
ax_kwargs.scatter(mu[0], mu[1], c='red', s=3)
ax_kwargs.set_title('使用关键字参数')

fig.subplots_adjust(hspace=0.25)
plt.show()

总结

在这个实验中，我们学习了如何使用 Python 的 Matplotlib 绘制二维数据集的置信椭圆。我们定义了 confidence_ellipse 和 get_correlated_dataset 函数，并使用它们来绘制具有不同相关性和标准差数量的数据集的椭圆。我们还展示了如何使用关键字参数来自定义椭圆的外观。

使用 Matplotlib 绘制置信椭圆

简介