バイアス - 分散分解 | バギングアンサンブル | 回帰分析

はじめに

この実験では、バイアス・バリアンス分解の概念と、単一の推定器とバギングアンサンブルとの関係について調べます。scikit-learn を使って、玩具の回帰問題を生成し可視化し、単一の推定器と決定木のバギングアンサンブルの期待平均二乗誤差を比較します。

VM のヒント

VM の起動が完了したら、左上隅をクリックしてノートブックタブに切り替え、Jupyter Notebook を使って練習しましょう。

Jupyter Notebook の読み込みには数秒かかる場合があります。Jupyter Notebook の制限により、操作の検証は自動化できません。

学習中に問題があった場合は、Labby にお問い合わせください。セッション後にフィードバックを提供してください。すぐに問題を解決いたします。

必要なライブラリをインポートする

まず、データを生成し、モデルを学習し、結果を可視化するために必要なライブラリをインポートする必要があります。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

パラメータを設定する

データセットのサイズ、反復回数、ノイズの標準偏差を制御するパラメータを設定する必要があります。

n_repeat = 50  ## 期待値を計算するための反復回数
n_train = 50  ## 学習セットのサイズ
n_test = 1000  ## テストセットのサイズ
noise = 0.1  ## ノイズの標準偏差
np.random.seed(0)

データを生成する

既知の関数を使って玩具の 1 次元回帰問題を生成し、学習セットとテストセットにランダムなノイズを追加します。複数の学習セットを生成して、期待平均二乗誤差を計算します。

def f(x):
    x = x.ravel()
    return np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2))

def generate(n_samples, noise, n_repeat=1):
    X = np.random.rand(n_samples) * 10 - 5
    X = np.sort(X)

    if n_repeat == 1:
        y = f(X) + np.random.normal(0.0, noise, n_samples)
    else:
        y = np.zeros((n_samples, n_repeat))

        for i in range(n_repeat):
            y[:, i] = f(X) + np.random.normal(0.0, noise, n_samples)

    X = X.reshape((n_samples, 1))

    return X, y

X_train = []
y_train = []

for i in range(n_repeat):
    X, y = generate(n_samples=n_train, noise=noise)
    X_train.append(X)
    y_train.append(y)

X_test, y_test = generate(n_samples=n_test, noise=noise, n_repeat=n_repeat)

比較するモデルを定義する

比較する 2 つのモデルを定義します。単一の決定木と決定木のバギングアンサンブルです。

estimators = [
    ("Tree", DecisionTreeRegressor()),
    ("Bagging(Tree)", BaggingRegressor(DecisionTreeRegressor())),
]
n_estimators = len(estimators)

モデルを学習して期待平均二乗誤差を計算する

推定器をループして、複数の学習セットで学習させ、バイアス、分散、ノイズの項に分解することで期待平均二乗誤差を計算します。また、モデルの予測とバイアス - 分散分解をプロットします。

plt.figure(figsize=(10, 8))

## 比較する推定器をループ
for n, (name, estimator) in enumerate(estimators):
    ## 予測を計算
    y_predict = np.zeros((n_test, n_repeat))

    for i in range(n_repeat):
        estimator.fit(X_train[i], y_train[i])
        y_predict[:, i] = estimator.predict(X_test)

    ## 平均二乗誤差のバイアス^2 + 分散 + ノイズ分解
    y_error = np.zeros(n_test)

    for i in range(n_repeat):
        for j in range(n_repeat):
            y_error += (y_test[:, j] - y_predict[:, i]) ** 2

    y_error /= n_repeat * n_repeat

    y_noise = np.var(y_test, axis=1)
    y_bias = (f(X_test) - np.mean(y_predict, axis=1)) ** 2
    y_var = np.var(y_predict, axis=1)

    print(
        "{0}: {1:.4f} (error) = {2:.4f} (bias^2) "
        " + {3:.4f} (var) + {4:.4f} (noise)".format(
            name, np.mean(y_error), np.mean(y_bias), np.mean(y_var), np.mean(y_noise)
        )
    )

    ## グラフを描画
    plt.subplot(2, n_estimators, n + 1)
    plt.plot(X_test, f(X_test), "b", label="$f(x)$")
    plt.plot(X_train[0], y_train[0], ".b", label="LS ~ $y = f(x)+noise$")

    for i in range(n_repeat):
        if i == 0:
            plt.plot(X_test, y_predict[:, i], "r", label=r"$\^y(x)$")
        else:
            plt.plot(X_test, y_predict[:, i], "r", alpha=0.05)

    plt.plot(X_test, np.mean(y_predict, axis=1), "c", label=r"$\mathbb{E}_{LS} \^y(x)$")

    plt.xlim([-5, 5])
    plt.title(name)

    if n == n_estimators - 1:
        plt.legend(loc=(1.1, 0.5))

    plt.subplot(2, n_estimators, n_estimators + n + 1)
    plt.plot(X_test, y_error, "r", label="$error(x)$")
    plt.plot(X_test, y_bias, "b", label="$bias^2(x)$"),
    plt.plot(X_test, y_var, "g", label="$variance(x)$"),
    plt.plot(X_test, y_noise, "c", label="$noise(x)$")

    plt.xlim([-5, 5])
    plt.ylim([0, 0.1])

    if n == n_estimators - 1:
        plt.legend(loc=(1.1, 0.5))

plt.subplots_adjust(right=0.75)
plt.show()

結果を解釈する

各モデルの期待平均二乗誤差のバイアス - 分散分解と、モデルの予測を観察することができます。また、2 つのモデルの合計誤差と、バイアスと分散のトレードオフを比較することもできます。

まとめ

この実験では、バイアス - 分散分解の概念と、単一の推定器とバギングアンサンブルとの関係を調べました。scikit-learn を使って玩具の回帰問題を生成し可視化し、単一の決定木と決定木のバギングアンサンブルの期待平均二乗誤差を比較しました。バイアスと分散のトレードオフはバギングの方が良いことがわかりました。なぜなら、バイアス項はわずかに増加しますが、分散を大幅に低減させることができるため、全体の平均二乗誤差が低くなるからです。

バギングによるバイアス - 分散分解