Scikit-Learn MLPClassifier: 確率的学習戦略の探索

はじめに

この実験では、Scikit-learn の MLPClassifier を使用して、SGD や Adam などのさまざまな確率的学習戦略の性能を比較するプロセスを案内します。MLPClassifier は、誤差逆伝播法を使用してネットワークの重みを最適化するニューラルネットワーク分類器です。この実験の目的は、異なる確率的学習戦略が MLPClassifier の訓練損失曲線にどのように影響するかを示すことです。この例ではいくつかの小さなデータセットを使用しますが、これらの例で示される一般的な傾向は、より大きなデータセットにも当てはまるようです。

VM のヒント

VM の起動が完了したら、左上隅をクリックして Notebook タブに切り替え、Jupyter Notebook を開いて練習を行ってください。

場合によっては、Jupyter Notebook の読み込みが完了するまで数秒待つ必要があることがあります。Jupyter Notebook の制限により、操作の検証を自動化することはできません。

学習中に問題が発生した場合は、いつでも Labby に相談してください。セッション終了後にフィードバックを提供していただければ、迅速に問題を解決します。

必要なライブラリをインポートする

まず、MLPClassifier、MinMaxScaler、datasets、および matplotlib.pyplot を含む必要なライブラリをインポートする必要があります。また、訓練中の収束警告を無視するために ConvergenceWarning もインポートします。

import warnings

import matplotlib.pyplot as plt

from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn import datasets
from sklearn.exceptions import ConvergenceWarning

異なる学習戦略を定義する

次に、比較したい異なる学習戦略を定義する必要があります。一定の学習率、モーメンタム付きの一定学習率、ネステロフのモーメンタム付きの一定学習率、逆スケーリング学習率、モーメンタム付きの逆スケーリング学習率、ネステロフのモーメンタム付きの逆スケーリング学習率、および Adam を含む、いくつかの異なる学習率スケジュールとモーメンタムパラメータを定義します。また、後でグラフに使用するラベルと plot_args も定義します。

## different learning rate schedules and momentum parameters
params = [
    {
        "solver": "sgd",
        "learning_rate": "constant",
        "momentum": 0,
        "learning_rate_init": 0.2,
    },
    {
        "solver": "sgd",
        "learning_rate": "constant",
        "momentum": 0.9,
        "nesterovs_momentum": False,
        "learning_rate_init": 0.2,
    },
    {
        "solver": "sgd",
        "learning_rate": "constant",
        "momentum": 0.9,
        "nesterovs_momentum": True,
        "learning_rate_init": 0.2,
    },
    {
        "solver": "sgd",
        "learning_rate": "invscaling",
        "momentum": 0,
        "learning_rate_init": 0.2,
    },
    {
        "solver": "sgd",
        "learning_rate": "invscaling",
        "momentum": 0.9,
        "nesterovs_momentum": True,
        "learning_rate_init": 0.2,
    },
    {
        "solver": "sgd",
        "learning_rate": "invscaling",
        "momentum": 0.9,
        "nesterovs_momentum": False,
        "learning_rate_init": 0.2,
    },
    {"solver": "adam", "learning_rate_init": 0.01},
]

labels = [
    "constant learning-rate",
    "constant with momentum",
    "constant with Nesterov's momentum",
    "inv-scaling learning-rate",
    "inv-scaling with momentum",
    "inv-scaling with Nesterov's momentum",
    "adam",
]

plot_args = [
    {"c": "red", "linestyle": "-"},
    {"c": "green", "linestyle": "-"},
    {"c": "blue", "linestyle": "-"},
    {"c": "red", "linestyle": "--"},
    {"c": "green", "linestyle": "--"},
    {"c": "blue", "linestyle": "--"},
    {"c": "black", "linestyle": "-"},
]

学習曲線をプロットする関数を定義する

次に、各データセットに対して各学習戦略の学習曲線をプロットする関数を定義する必要があります。この関数は、データセット (X, y)、プロットする軸、およびデータセットの名前を引数として受け取ります。MinMaxScaler を使用してデータをスケーリングし、MLPClassifier を使用してニューラルネットワークを訓練します。各学習戦略を使用してネットワークを訓練し、収束警告を無視して、同じグラフに各戦略の学習曲線をプロットします。

def plot_on_dataset(X, y, ax, name):
    ## for each dataset, plot learning for each learning strategy
    print("\nlearning on dataset %s" % name)
    ax.set_title(name)

    X = MinMaxScaler().fit_transform(X)
    mlps = []
    if name == "digits":
        ## digits is larger but converges fairly quickly
        max_iter = 15
    else:
        max_iter = 400

    for label, param in zip(labels, params):
        print("training: %s" % label)
        mlp = MLPClassifier(random_state=0, max_iter=max_iter, **param)

        ## some parameter combinations will not converge as can be seen on the
        ## plots so they are ignored here
        with warnings.catch_warnings():
            warnings.filterwarnings(
                "ignore", category=ConvergenceWarning, module="sklearn"
            )
            mlp.fit(X, y)

        mlps.append(mlp)
        print("Training set score: %f" % mlp.score(X, y))
        print("Training set loss: %f" % mlp.loss_)
    for mlp, label, args in zip(mlps, labels, plot_args):
        ax.plot(mlp.loss_curve_, label=label, **args)

小さなデータセットを読み込むまたは生成する

ここで、この例で使用する小さなデータセットを読み込むか生成する必要があります。アヤメデータセット (iris dataset)、手書き数字データセット (digits dataset)、および make_circles と make_moons 関数を使用して生成される 2 つのデータセットを使用します。

iris = datasets.load_iris()
X_digits, y_digits = datasets.load_digits(return_X_y=True)
data_sets = [
    (iris.data, iris.target),
    (X_digits, y_digits),
    datasets.make_circles(noise=0.2, factor=0.5, random_state=1),
    datasets.make_moons(noise=0.3, random_state=0),
]

各データセットの学習曲線をプロットする

最後に、plot_on_dataset 関数を使用して各データセットの学習曲線をプロットできます。2x2 のグラフを作成し、各データセットを別々の軸にプロットします。

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

for ax, data, name in zip(
    axes.ravel(), data_sets, ["iris", "digits", "circles", "moons"]
):
    plot_on_dataset(*data, ax=ax, name=name)

fig.legend(ax.get_lines(), labels, ncol=3, loc="upper center")
plt.show()

まとめ

この実験では、Scikit-learn の MLPClassifier を使用して、いくつかの小さなデータセットに対して、SGD や Adam などのさまざまな確率的学習戦略の性能を比較しました。異なる学習率スケジュールとモーメンタムパラメータを定義し、各戦略を使用して MLPClassifier を訓練しました。各データセットに対して各戦略の学習曲線をプロットし、異なる戦略が訓練損失曲線にどのように影響するかを観察しました。データセットとタスクに適した学習戦略を選択することの重要性を示しました。