分位点回帰チュートリアル | 机械学习 | データ分析

はじめに

このチュートリアルでは、scikit-learn を使って分位点回帰を行う方法を示します。非自明な条件付き分位点を予測できるかどうかを示すために、2 つの合成データセットを生成します。中央値と、それぞれ 5％と 95％に固定された低い分位点と高い分位点を推定するために、QuantileRegressorクラスを使用します。QuantileRegressorをLinearRegressionと比较し、平均絶対误差（MAE）と平均二乗误差（MSE）を使ってそれらの性能を评価します。

VM のヒント

VM の起动が完了したら、左上隅をクリックしてノートブックタブに切り替え、Jupyter Notebook を使って练习します。

场合によっては、Jupyter Notebook が読み込み终わるまで数秒待つ必要があります。Jupyter Notebook の制限により、操作の検证は自动化できません。

学习中に问题に直面した场合は、Labby に质问してください。セッション终了后にフィードバックを提供してください。すぐに问题を解决いたします。

データセットの生成

単一の特徴量xとの线形関系を使って、同じ期待値を持つ 2 つの合成データセットを生成します。データセットには、异方分散性のノルマルノイズと非対称パレートノイズを追加します。

import numpy as np

rng = np.random.RandomState(42)
x = np.linspace(start=0, stop=10, num=100)
X = x[:, np.newaxis]
y_true_mean = 10 + 0.5 * x

## 异方分散性のノルマルノイズ
y_normal = y_true_mean + rng.normal(loc=0, scale=0.5 + 0.5 * x, size=x.shape[0])

## 非対称パレートノイズ
a = 5
y_pareto = y_true_mean + 10 * (rng.pareto(a, size=x.shape[0]) - 1 / (a - 1))

データセットの可视化

データセットと、残差y - mean(y)の分布を可视化します。

import matplotlib.pyplot as plt

_, axs = plt.subplots(nrows=2, ncols=2, figsize=(15, 11), sharex="row", sharey="row")

axs[0, 0].plot(x, y_true_mean, label="True mean")
axs[0, 0].scatter(x, y_normal, color="black", alpha=0.5, label="Observations")
axs[1, 0].hist(y_true_mean - y_normal, edgecolor="black")

axs[0, 1].plot(x, y_true_mean, label="True mean")
axs[0, 1].scatter(x, y_pareto, color="black", alpha=0.5, label="Observations")
axs[1, 1].hist(y_true_mean - y_pareto, edgecolor="black")

axs[0, 0].set_title("Dataset with heteroscedastic Normal distributed targets")
axs[0, 1].set_title("Dataset with asymmetric Pareto distributed target")
axs[1, 0].set_title(
    "Residuals distribution for heteroscedastic Normal distributed targets"
)
axs[1, 1].set_title("Residuals distribution for asymmetric Pareto distributed target")
axs[0, 0].legend()
axs[0, 1].legend()
axs[0, 0].set_ylabel("y")
axs[1, 0].set_ylabel("Counts")
axs[0, 1].set_xlabel("x")
axs[0, 0].set_xlabel("x")
axs[1, 0].set_xlabel("Residuals")
_ = axs[1, 1].set_xlabel("Residuals")