확률적 경사 하강법 | MNIST 데이터셋 | 조기 종료

소개

확률적 경사 하강법 (Stochastic Gradient Descent) 은 손실 함수를 최소화하는 데 사용되는 인기 있는 최적화 기법입니다. 이 기법은 각 반복에서 샘플을 무작위로 선택하여 (즉, 확률적으로) 단계적으로 경사 하강법을 수행합니다. 이 방법은 특히 선형 모델을 맞추는 데 효율적입니다. 그러나 각 반복에서 수렴이 보장되지 않으며 손실 함수가 반드시 각 반복에서 감소하지 않을 수 있습니다. 이 경우 손실 함수의 수렴을 모니터링하는 것이 어려울 수 있습니다. 이 실습에서는 검증 점수에 대한 수렴을 모니터링하는 방법인 조기 종료 전략을 탐색할 것입니다. SGDClassifier 모델과 MNIST 데이터 세트를 사용하여 조기 종료가 조기 종료를 사용하지 않고 구축된 모델과 거의 동일한 정확도를 달성하고 훈련 시간을 상당히 줄일 수 있는 방법을 보여줄 것입니다.

VM 팁

VM 시작이 완료되면 왼쪽 상단 모서리를 클릭하여 Notebook 탭으로 전환하여 연습을 위한 Jupyter Notebook에 접근할 수 있습니다.

때때로 Jupyter Notebook 이 완전히 로드되기까지 몇 초 정도 기다려야 할 수 있습니다. Jupyter Notebook 의 제한으로 인해 작업의 유효성 검사를 자동화할 수 없습니다.

학습 중 문제가 발생하면 Labby 에 문의하십시오. 세션 후 피드백을 제공하면 문제를 신속하게 해결해 드리겠습니다.

필요한 라이브러리 및 MNIST 데이터셋 로드

첫 번째 단계는 필요한 라이브러리와 데이터셋을 로드하는 것입니다. pandas, numpy, matplotlib, scikit-learn 라이브러리를 사용할 것입니다. 또한 scikit-learn 의 fetch_openml 함수를 사용하여 MNIST 데이터셋을 로드합니다.

import time
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.utils._testing import ignore_warnings
from sklearn.exceptions import ConvergenceWarning
from sklearn.utils import shuffle

## MNIST 데이터셋 로드
def load_mnist(n_samples=None, class_0="0", class_1="8"):
    """MNIST 로드, 두 클래스 선택, 셔플, n_samples 만 반환."""
    ## http://openml.org/d/554에서 데이터 로드
    mnist = fetch_openml("mnist_784", version=1, as_frame=False, parser="pandas")

    ## 이진 분류를 위해 두 클래스만 선택
    mask = np.logical_or(mnist.target == class_0, mnist.target == class_1)

    X, y = shuffle(mnist.data[mask], mnist.target[mask], random_state=42)
    if n_samples is not None:
        X, y = X[:n_samples], y[:n_samples]
    return X, y

X, y = load_mnist(n_samples=10000)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

추정기 및 조기 종료 전략 정의

다음 단계는 추정기와 조기 종료 전략을 정의하는 것입니다. scikit-learn 의 SGDClassifier 모델을 사용할 것입니다. 세 가지 다른 중단 기준 (중단 기준 없음, 훈련 손실, 검증 점수) 을 정의할 것입니다. fit_and_score 함수를 사용하여 추정기를 훈련 세트에 맞추고 두 세트 모두에 대해 점수를 매길 것입니다.

@ignore_warnings(category=ConvergenceWarning)
def fit_and_score(estimator, max_iter, X_train, X_test, y_train, y_test):
    """훈련 세트에 추정기를 맞추고 두 세트 모두에 대해 점수를 매깁니다."""
    estimator.set_params(max_iter=max_iter)
    estimator.set_params(random_state=0)

    start = time.time()
    estimator.fit(X_train, y_train)

    fit_time = time.time() - start
    n_iter = estimator.n_iter_
    train_score = estimator.score(X_train, y_train)
    test_score = estimator.score(X_test, y_test)

    return fit_time, n_iter, train_score, test_score

## 비교할 추정기 정의
estimator_dict = {
    "중단 기준 없음": linear_model.SGDClassifier(n_iter_no_change=3),
    "훈련 손실": linear_model.SGDClassifier(
        early_stopping=False, n_iter_no_change=3, tol=0.1
    ),
    "검증 점수": linear_model.SGDClassifier(
        early_stopping=True, n_iter_no_change=3, tol=0.0001, validation_fraction=0.2
    ),
}

추정기 학습 및 평가

다음 단계는 각 중단 기준을 사용하여 추정기를 학습하고 평가하는 것입니다. 루프를 사용하여 각 추정기와 중단 기준을 반복하고, 다른 최대 반복 횟수를 반복하는 또 다른 루프를 사용할 것입니다. 그런 다음 결과를 pandas 데이터프레임에 저장하여 플롯하기 쉽게 만들 것입니다.

results = []
for estimator_name, estimator in estimator_dict.items():
    print(estimator_name + ": ", end="")
    for max_iter in range(1, 50):
        print(".", end="")
        sys.stdout.flush()

        fit_time, n_iter, train_score, test_score = fit_and_score(
            estimator, max_iter, X_train, X_test, y_train, y_test
        )

        results.append(
            (estimator_name, max_iter, fit_time, n_iter, train_score, test_score)
        )
    print("")

## 결과를 pandas 데이터프레임으로 변환하여 플롯하기 쉽게 만듭니다.
columns = [
    "중단 기준",
    "max_iter",
    "학습 시간 (초)",
    "n_iter_",
    "훈련 점수",
    "테스트 점수",
]
results_df = pd.DataFrame(results, columns=columns)

결과 플롯

마지막 단계는 결과를 플롯하는 것입니다. 훈련 및 테스트 점수와 반복 횟수 및 학습 시간을 플롯하기 위해 두 개의 서브플롯을 사용할 것입니다. 각 추정기와 중단 기준에 대해 다른 선 스타일을 사용할 것입니다.

## 플롯할 내용 정의
lines = "중단 기준"
x_axis = "max_iter"
styles = ["-.", "--", "-"]

## 첫 번째 플롯: 훈련 및 테스트 점수
fig, axes = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(12, 4))
for ax, y_axis in zip(axes, ["훈련 점수", "테스트 점수"]):
    for style, (criterion, group_df) in zip(styles, results_df.groupby(lines)):
        group_df.plot(x=x_axis, y=y_axis, label=criterion, ax=ax, style=style)
    ax.set_title(y_axis)
    ax.legend(title=lines)
fig.tight_layout()

## 두 번째 플롯: n_iter 및 학습 시간
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
for ax, y_axis in zip(axes, ["n_iter_", "학습 시간 (초)"]):
    for style, (criterion, group_df) in zip(styles, results_df.groupby(lines)):
        group_df.plot(x=x_axis, y=y_axis, label=criterion, ax=ax, style=style)
    ax.set_title(y_axis)
    ax.legend(title=lines)
fig.tight_layout()

plt.show()

요약

이 실험에서는 확률적 경사 하강법 (Stochastic Gradient Descent) 을 사용하여 손실 함수를 최소화할 때 검증 점수의 수렴을 모니터링하기 위한 조기 종료 전략을 탐구했습니다. scikit-learn 의 SGDClassifier 모델과 MNIST 데이터셋을 사용하여 조기 종료가 조기 종료를 사용하지 않고 구축된 모델과 거의 동일한 정확도를 달성하고 학습 시간을 상당히 줄일 수 있는 방법을 보여주었습니다. 세 가지 다른 중단 기준 (중단 기준 없음, 훈련 손실, 검증 점수) 을 정의하고, 각 중단 기준을 사용하여 추정기를 학습하고 평가하기 위한 루프를 사용했습니다. 그런 다음 각 추정기와 중단 기준에 대해 다른 선 스타일을 사용하여 결과를 플롯했습니다.

확률적 경사 하강법의 조기 종료

소개

VM 팁

필요한 라이브러리 및 MNIST 데이터셋 로드

추정기 및 조기 종료 전략 정의

추정기 학습 및 평가

결과 플롯

요약