Python 을 이용한 데이터 시각화: 범례가 있는 산점도 생성

소개

산점도는 두 변수 간의 관계를 시각화하는 데 사용됩니다. 범례가 있는 산점도는 데이터에 여러 그룹이 있고, 플롯에서 이를 구별하려는 경우 유용합니다. 이 랩에서는 Matplotlib 라이브러리를 사용하여 Python 에서 범례가 있는 산점도를 만드는 방법을 배웁니다.

VM 팁

VM 시작이 완료되면 왼쪽 상단 모서리를 클릭하여 Notebook 탭으로 전환하여 실습을 위해 Jupyter Notebook에 액세스하십시오.

때로는 Jupyter Notebook 이 로딩을 완료하는 데 몇 초 정도 기다려야 할 수 있습니다. Jupyter Notebook 의 제한 사항으로 인해 작업의 유효성 검사는 자동화될 수 없습니다.

학습 중에 문제가 발생하면 Labby 에게 문의하십시오. 세션 후 피드백을 제공해주시면 문제를 즉시 해결해 드리겠습니다.

필요한 라이브러리 가져오기

NumPy 와 Matplotlib 을 포함하여 필요한 라이브러리를 가져오는 것으로 시작합니다.

import matplotlib.pyplot as plt
import numpy as np

여러 그룹이 있는 산점도 생성

각 그룹을 반복하고 해당 그룹에 대한 산점도를 생성하여 여러 그룹이 있는 산점도를 만들 수 있습니다. c, s, 및 alpha 매개변수를 사용하여 각 그룹에 대한 마커의 색상, 크기 및 투명도를 각각 지정합니다. 또한 범례에 사용될 label 매개변수를 그룹 이름으로 설정합니다.

fig, ax = plt.subplots()
for color in ['tab:blue', 'tab:orange', 'tab:green']:
    n = 750
    x, y = np.random.rand(2, n)
    scale = 200.0 * np.random.rand(n)
    ax.scatter(x, y, c=color, s=scale, label=color,
               alpha=0.3, edgecolors='none')

ax.legend()
ax.grid(True)

plt.show()

자동 범례 생성

PathCollection.legend_elements 메서드를 사용하여 산점도에 대한 범례를 자동으로 생성할 수도 있습니다. 이 메서드는 표시할 유용한 수의 범례 항목을 결정하려고 시도하고 핸들과 레이블의 튜플을 반환합니다.

N = 45
x, y = np.random.rand(2, N)
c = np.random.randint(1, 5, size=N)
s = np.random.randint(10, 220, size=N)

fig, ax = plt.subplots()

scatter = ax.scatter(x, y, c=c, s=s)

## produce a legend with the unique colors from the scatter
legend1 = ax.legend(*scatter.legend_elements(),
                    loc="lower left", title="Classes")
ax.add_artist(legend1)

## produce a legend with a cross-section of sizes from the scatter
handles, labels = scatter.legend_elements(prop="sizes", alpha=0.6)
legend2 = ax.legend(handles, labels, loc="upper right", title="Sizes")

plt.show()

범례 요소 사용자 정의

PathCollection.legend_elements 메서드에서 추가 인수를 사용하여 범례 요소를 더 사용자 정의할 수 있습니다. 예를 들어, 생성할 범례 항목의 수와 레이블 지정 방법을 지정할 수 있습니다.

volume = np.random.rayleigh(27, size=40)
amount = np.random.poisson(10, size=40)
ranking = np.random.normal(size=40)
price = np.random.uniform(1, 10, size=40)

fig, ax = plt.subplots()

## Because the price is much too small when being provided as size for ``s``,
## we normalize it to some useful point sizes, s=0.3*(price*3)**2
scatter = ax.scatter(volume, amount, c=ranking, s=0.3*(price*3)**2,
                     vmin=-3, vmax=3, cmap="Spectral")

## Produce a legend for the ranking (colors). Even though there are 40 different
## rankings, we only want to show 5 of them in the legend.
legend1 = ax.legend(*scatter.legend_elements(num=5),
                    loc="upper left", title="Ranking")
ax.add_artist(legend1)

## Produce a legend for the price (sizes). Because we want to show the prices
## in dollars, we use the *func* argument to supply the inverse of the function
## used to calculate the sizes from above. The *fmt* ensures to show the price
## in dollars. Note how we target at 5 elements here, but obtain only 4 in the
## created legend due to the automatic round prices that are chosen for us.
kw = dict(prop="sizes", num=5, color=scatter.cmap(0.7), fmt="$ {x:.2f}",
          func=lambda s: np.sqrt(s/.3)/3)
legend2 = ax.legend(*scatter.legend_elements(**kw),
                    loc="lower right", title="Price")

plt.show()

요약

이 랩에서는 Matplotlib 라이브러리를 사용하여 Python 에서 범례가 있는 산점도를 만드는 방법을 배웠습니다. 여러 그룹이 있는 산점도를 생성하고 자동 범례 생성을 수행했습니다. 또한 PathCollection.legend_elements 메서드를 사용하여 범례 요소를 사용자 정의했습니다. 범례가 있는 산점도는 여러 그룹이 있는 두 변수 간의 관계를 시각화하는 데 유용합니다.