如何用 C 语言计算线性回归参数

简介

在本实验中，你将学习如何使用 C 语言编程来计算线性回归参数，包括斜率（m）和截距（b）。本实验涵盖了读取（x, y）数据点、计算斜率和截距以及以 y = mx + b 的格式打印线性回归方程的逐步过程。本实验提供了一种使用广泛使用的编程语言 C 进行统计数据分析和建模的实用方法。

读取（x, y）数据点

在这一步中，你将学习如何在 C 语言中读取用于线性回归分析的（x, y）数据点。我们将创建一个程序，允许输入多个数据点并存储它们以供进一步计算。

首先，让我们创建一个 C 文件来实现数据点读取：

cd ~/project
nano linear_regression.c

现在，将以下代码添加到文件中：

#include <stdio.h>
#define MAX_POINTS 100

typedef struct {
    double x;
    double y;
} DataPoint;

int main() {
    DataPoint points[MAX_POINTS];
    int num_points = 0;

    printf("Enter x and y coordinates (enter -1 -1 to finish):\n");

    while (num_points < MAX_POINTS) {
        double x, y;
        scanf("%lf %lf", &x, &y);

        if (x == -1 && y == -1) {
            break;
        }

        points[num_points].x = x;
        points[num_points].y = y;
        num_points++;
    }

    printf("\nData Points Entered:\n");
    for (int i = 0; i < num_points; i++) {
        printf("Point %d: (%.2f, %.2f)\n", i+1, points[i].x, points[i].y);
    }

    return 0;
}

编译程序：

gcc -o linear_regression linear_regression.c

运行程序并输入一些示例数据点：

./linear_regression

示例输出：

Enter x and y coordinates (enter -1 -1 to finish):
1 2
2 4
3 5
4 4
5 5
-1 -1

Data Points Entered:
Point 1: (1.00, 2.00)
Point 2: (2.00, 4.00)
Point 3: (3.00, 5.00)
Point 4: (4.00, 4.00)
Point 5: (5.00, 5.00)

让我们详细分析这段代码的关键部分：

我们定义了一个DataPoint结构体来存储 x 和 y 坐标。
MAX_POINTS限制了数据点的数量，以防止溢出。
程序使用while循环来读取坐标。
用户可以输入数据点，并通过输入-1 -1终止输入。
程序会打印出所有输入的数据点以供验证。

计算斜率（m）和截距（b）

在这一步中，你将学习如何使用最小二乘法计算线性回归的斜率（m）和截距（b）。

首先，更新之前的linear_regression.c文件：

cd ~/project
nano linear_regression.c

用以下实现替换之前的代码：

#include <stdio.h>
#include <math.h>
#define MAX_POINTS 100

typedef struct {
    double x;
    double y;
} DataPoint;

// 计算线性回归参数的函数
void computeLinearRegression(DataPoint points[], int num_points, double *m, double *b) {
    double sum_x = 0, sum_y = 0, sum_xy = 0, sum_x_squared = 0;

    for (int i = 0; i < num_points; i++) {
        sum_x += points[i].x;
        sum_y += points[i].y;
        sum_xy += points[i].x * points[i].y;
        sum_x_squared += points[i].x * points[i].x;
    }

    double n = num_points;

    // 计算斜率（m）
    *m = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x * sum_x);

    // 计算 y 轴截距（b）
    *b = (sum_y - (*m) * sum_x) / n;
}

int main() {
    DataPoint points[MAX_POINTS];
    int num_points = 0;

    printf("Enter x and y coordinates (enter -1 -1 to finish):\n");

    while (num_points < MAX_POINTS) {
        double x, y;
        scanf("%lf %lf", &x, &y);

        if (x == -1 && y == -1) {
            break;
        }

        points[num_points].x = x;
        points[num_points].y = y;
        num_points++;
    }

    double slope, intercept;
    computeLinearRegression(points, num_points, &slope, &intercept);

    printf("\n线性回归结果：\n");
    printf("数据点数量：%d\n", num_points);
    printf("斜率（m）：%.4f\n", slope);
    printf("Y 轴截距（b）：%.4f\n", intercept);
    printf("方程：y = %.4fx + %.4f\n", slope, intercept);

    return 0;
}

使用数学库编译程序：

gcc -o linear_regression linear_regression.c -lm

使用示例数据点运行程序：

./linear_regression

示例输出：

输入x和y坐标（输入 -1 -1 结束）：
1 2
2 4
3 5
4 4
5 5
-1 -1

线性回归结果：
数据点数量：5
斜率（m）：0.6000
Y轴截距（b）：2.2000
方程：y = 0.6000x + 2.2000

线性回归计算的关键点：

我们使用最小二乘法计算斜率和截距。
斜率公式为：m = (n _ Σ(xy) - Σx _ Σy) / (n * Σ(x²) - (Σx)²)
y 轴截距公式为：b = (Σy - m * Σx) / n
函数computeLinearRegression()计算这些参数
主函数打印回归方程

打印 y = mx + b

在这一步中，你将学习如何打印线性回归方程，并使用计算出的斜率和截距预测 y 值。

更新linear_regression.c文件以添加预测功能：

cd ~/project
nano linear_regression.c

用以下实现替换之前的代码：

#include <stdio.h>
#include <math.h>
#define MAX_POINTS 100

typedef struct {
    double x;
    double y;
} DataPoint;

void computeLinearRegression(DataPoint points[], int num_points, double *m, double *b) {
    double sum_x = 0, sum_y = 0, sum_xy = 0, sum_x_squared = 0;

    for (int i = 0; i < num_points; i++) {
        sum_x += points[i].x;
        sum_y += points[i].y;
        sum_xy += points[i].x * points[i].y;
        sum_x_squared += points[i].x * points[i].x;
    }

    double n = num_points;

    *m = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x * sum_x);
    *b = (sum_y - (*m) * sum_x) / n;
}

// 预测 y 值的函数
double predictY(double m, double b, double x) {
    return m * x + b;
}

int main() {
    DataPoint points[MAX_POINTS];
    int num_points = 0;

    printf("Enter x and y coordinates (enter -1 -1 to finish):\n");

    while (num_points < MAX_POINTS) {
        double x, y;
        scanf("%lf %lf", &x, &y);

        if (x == -1 && y == -1) {
            break;
        }

        points[num_points].x = x;
        points[num_points].y = y;
        num_points++;
    }

    double slope, intercept;
    computeLinearRegression(points, num_points, &slope, &intercept);

    printf("\n线性回归方程：\n");
    printf("y = %.4fx + %.4f\n", slope, intercept);

    // 打印示例 x 值的预测结果
    printf("\n预测的 y 值：\n");
    double test_x_values[] = {0, 2.5, 6, 10};
    for (int i = 0; i < 4; i++) {
        double predicted_y = predictY(slope, intercept, test_x_values[i]);
        printf("当 x = %.2f 时，y = %.4f\n", test_x_values[i], predicted_y);
    }

    return 0;
}

编译程序：

gcc -o linear_regression linear_regression.c -lm

使用示例数据点运行程序：

./linear_regression

示例输出：

输入x和y坐标（输入 -1 -1 结束）：
1 2
2 4
3 5
4 4
5 5
-1 -1

线性回归方程：
y = 0.6000x + 2.2000

预测的y值：
当x = 0.00时，y = 2.2000
当x = 2.50时，y = 3.7000
当x = 6.00时，y = 5.8000
当x = 10.00时，y = 8.2000

打印回归方程的关键点：

我们添加了一个predictY()函数，用于计算给定 x 时的 y 值
主函数打印完整的方程：y = mx + b
我们通过展示不同 x 输入的 y 值来演示预测
输出清晰地展示了线性回归模型

总结

在本实验中，你学习了如何在 C 语言中读取用于线性回归分析的（x, y）数据点。你创建了一个程序，该程序允许输入多个数据点并存储它们以供进一步计算。你还学习了如何打印输入的数据点以进行验证。

接下来，你将学习如何计算线性回归线的斜率（m）和截距（b），然后以 y = mx + b 的形式打印方程。