신뢰 타원 플로팅 | Python Matplotlib 튜토리얼 - 데이터 시각화

소개

이 랩에서는 Python Matplotlib 을 사용하여 2 차원 데이터셋의 신뢰 타원을 그리는 방법을 시연합니다. 신뢰 타원은 데이터셋의 공분산을 그래픽으로 표현한 것으로, 추정된 평균과 표준 편차의 불확실성을 보여줍니다. 타원은 Pearson 상관 계수를 사용하여 그려집니다.

VM 팁

VM 시작이 완료되면 왼쪽 상단을 클릭하여 Notebook 탭으로 전환하여 실습을 위해 Jupyter Notebook에 액세스하십시오.

때로는 Jupyter Notebook 이 로딩을 완료하는 데 몇 초 정도 기다려야 할 수 있습니다. Jupyter Notebook 의 제한으로 인해 작업의 유효성 검사는 자동화할 수 없습니다.

학습 중에 문제가 발생하면 언제든지 Labby 에게 문의하십시오. 세션 후 피드백을 제공해주시면 문제를 즉시 해결해 드리겠습니다.

필요한 라이브러리 가져오기

첫 번째 단계는 필요한 라이브러리를 가져오는 것입니다. 이 랩에서는 numpy와 matplotlib.pyplot이 필요합니다.

import matplotlib.pyplot as plt
import numpy as np

`confidence_ellipse` 함수 정의

다음으로, 데이터셋의 x 및 y 좌표, 타원을 그릴 축 객체, 그리고 표준 편차의 개수를 입력으로 받는 confidence_ellipse 함수를 정의합니다. 이 함수는 타원을 나타내는 Matplotlib 패치 객체를 반환합니다.

def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
    """
    *x*와 *y*의 공분산 신뢰 타원 플롯을 생성합니다.

    매개변수
    ----------
    x, y : array-like, shape (n, )
        입력 데이터.

    ax : matplotlib.axes.Axes
        타원을 그릴 축 객체.

    n_std : float
        타원의 반경을 결정하기 위한 표준 편차의 개수.

    **kwargs
        `~matplotlib.patches.Ellipse` 로 전달됨

    반환값
    -------
    matplotlib.patches.Ellipse
    """
    if x.size != y.size:
        raise ValueError("x and y must be the same size")

    cov = np.cov(x, y)
    pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
    ## Using a special case to obtain the eigenvalues of this
    ## two-dimensional dataset.
    ell_radius_x = np.sqrt(1 + pearson)
    ell_radius_y = np.sqrt(1 - pearson)
    ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2,
                      facecolor=facecolor, **kwargs)

    ## Calculating the standard deviation of x from
    ## the squareroot of the variance and multiplying
    ## with the given number of standard deviations.
    scale_x = np.sqrt(cov[0, 0]) * n_std
    mean_x = np.mean(x)

    ## calculating the standard deviation of y ...
    scale_y = np.sqrt(cov[1, 1]) * n_std
    mean_y = np.mean(y)

    transf = transforms.Affine2D() \
        .rotate_deg(45) \
        .scale(scale_x, scale_y) \
        .translate(mean_x, mean_y)

    ellipse.set_transform(transf + ax.transData)
    return ax.add_patch(ellipse)

`get_correlated_dataset` 함수 정의

또한 지정된 평균, 차원 및 상관 관계를 가진 2 차원 데이터셋을 생성하는 함수가 필요합니다.

def get_correlated_dataset(n, dependency, mu, scale):
    """
    지정된 2 차원 평균 (mu) 및 차원 (scale) 을 가진 무작위 2 차원 데이터셋을 생성합니다.
    상관 관계는 2x2 행렬인 'dependency' 매개변수를 통해 제어할 수 있습니다.
    """
    latent = np.random.randn(n, 2)
    dependent = latent.dot(dependency)
    scaled = dependent * scale
    scaled_with_offset = scaled + mu
    ## return x and y of the new, correlated dataset
    return scaled_with_offset[:, 0], scaled_with_offset[:, 1]

양의 상관 관계, 음의 상관 관계 및 약한 상관 관계 플로팅

이제 이러한 함수를 사용하여 양의 상관 관계, 음의 상관 관계 및 약한 상관 관계를 가진 데이터셋의 신뢰 타원을 플로팅할 수 있습니다.

np.random.seed(0)

PARAMETERS = {
    'Positive correlation': [[0.85, 0.35],
                             [0.15, -0.65]],
    'Negative correlation': [[0.9, -0.4],
                             [0.1, -0.6]],
    'Weak correlation': [[1, 0],
                         [0, 1]],
}

mu = 2, 4
scale = 3, 5

fig, axs = plt.subplots(1, 3, figsize=(9, 3))
for ax, (title, dependency) in zip(axs, PARAMETERS.items()):
    x, y = get_correlated_dataset(800, dependency, mu, scale)
    ax.scatter(x, y, s=0.5)

    ax.axvline(c='grey', lw=1)
    ax.axhline(c='grey', lw=1)

    confidence_ellipse(x, y, ax, edgecolor='red')

    ax.scatter(mu[0], mu[1], c='red', s=3)
    ax.set_title(title)

plt.show()

서로 다른 표준 편차 수 플로팅

또한 서로 다른 수의 표준 편차를 가진 신뢰 타원을 플로팅할 수 있습니다.

fig, ax_nstd = plt.subplots(figsize=(6, 6))

dependency_nstd = [[0.8, 0.75],
                   [-0.2, 0.35]]
mu = 0, 0
scale = 8, 5

ax_nstd.axvline(c='grey', lw=1)
ax_nstd.axhline(c='grey', lw=1)

x, y = get_correlated_dataset(500, dependency_nstd, mu, scale)
ax_nstd.scatter(x, y, s=0.5)

confidence_ellipse(x, y, ax_nstd, n_std=1,
                   label=r'$1\sigma$', edgecolor='firebrick')
confidence_ellipse(x, y, ax_nstd, n_std=2,
                   label=r'$2\sigma$', edgecolor='fuchsia', linestyle='--')
confidence_ellipse(x, y, ax_nstd, n_std=3,
                   label=r'$3\sigma$', edgecolor='blue', linestyle=':')

ax_nstd.scatter(mu[0], mu[1], c='red', s=3)
ax_nstd.set_title('Different standard deviations')
ax_nstd.legend()
plt.show()

키워드 인자 사용

마지막으로, 키워드 인자를 사용하여 타원의 모양을 사용자 정의할 수 있습니다.

fig, ax_kwargs = plt.subplots(figsize=(6, 6))
dependency_kwargs = [[-0.8, 0.5],
                     [-0.2, 0.5]]
mu = 2, -3
scale = 6, 5

ax_kwargs.axvline(c='grey', lw=1)
ax_kwargs.axhline(c='grey', lw=1)

x, y = get_correlated_dataset(500, dependency_kwargs, mu, scale)
## Plot the ellipse with zorder=0 in order to demonstrate
## its transparency (caused by the use of alpha).
confidence_ellipse(x, y, ax_kwargs,
                   alpha=0.5, facecolor='pink', edgecolor='purple', zorder=0)

ax_kwargs.scatter(x, y, s=0.5)
ax_kwargs.scatter(mu[0], mu[1], c='red', s=3)
ax_kwargs.set_title('Using keyword arguments')

fig.subplots_adjust(hspace=0.25)
plt.show()

요약

이 랩에서는 Python Matplotlib 을 사용하여 2 차원 데이터셋의 신뢰 타원을 플로팅하는 방법을 배웠습니다. confidence_ellipse 및 get_correlated_dataset 함수를 정의하고, 이를 사용하여 서로 다른 상관 관계와 표준 편차 수를 가진 데이터셋의 타원을 플로팅했습니다. 또한 키워드 인자를 사용하여 타원의 모양을 사용자 정의하는 방법도 보여주었습니다.

Matplotlib 을 이용한 신뢰 타원 플로팅

소개