Compute Linear Regression Parameters in C

CCBeginner
Practice Now

Introduction

In this lab, you will learn how to compute the linear regression parameters, including the slope (m) and intercept (b), using C programming. The lab covers the step-by-step process of reading (x,y) data points, calculating the slope and intercept, and printing the linear regression equation in the format y = mx + b. This lab provides a practical approach to statistical data analysis and modeling using C, a widely-used programming language.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL c(("`C`")) -.-> c/UserInteractionGroup(["`User Interaction`"]) c(("`C`")) -.-> c/BasicsGroup(["`Basics`"]) c(("`C`")) -.-> c/CompoundTypesGroup(["`Compound Types`"]) c(("`C`")) -.-> c/FunctionsGroup(["`Functions`"]) c/UserInteractionGroup -.-> c/output("`Output`") c/BasicsGroup -.-> c/data_types("`Data Types`") c/CompoundTypesGroup -.-> c/arrays("`Arrays`") c/UserInteractionGroup -.-> c/user_input("`User Input`") c/FunctionsGroup -.-> c/math_functions("`Math Functions`") subgraph Lab Skills c/output -.-> lab-435150{{"`Compute Linear Regression Parameters in C`"}} c/data_types -.-> lab-435150{{"`Compute Linear Regression Parameters in C`"}} c/arrays -.-> lab-435150{{"`Compute Linear Regression Parameters in C`"}} c/user_input -.-> lab-435150{{"`Compute Linear Regression Parameters in C`"}} c/math_functions -.-> lab-435150{{"`Compute Linear Regression Parameters in C`"}} end

Read (x,y) Data Points

In this step, you'll learn how to read (x,y) data points for linear regression analysis in C. We'll create a program that allows input of multiple data points and stores them for further calculation.

First, let's create a C file to implement data point reading:

cd ~/project
nano linear_regression.c

Now, add the following code to the file:

#include <stdio.h>
#define MAX_POINTS 100

typedef struct {
    double x;
    double y;
} DataPoint;

int main() {
    DataPoint points[MAX_POINTS];
    int num_points = 0;

    printf("Enter x and y coordinates (enter -1 -1 to finish):\n");

    while (num_points < MAX_POINTS) {
        double x, y;
        scanf("%lf %lf", &x, &y);

        if (x == -1 && y == -1) {
            break;
        }

        points[num_points].x = x;
        points[num_points].y = y;
        num_points++;
    }

    printf("\nData Points Entered:\n");
    for (int i = 0; i < num_points; i++) {
        printf("Point %d: (%.2f, %.2f)\n", i+1, points[i].x, points[i].y);
    }

    return 0;
}

Compile the program:

gcc -o linear_regression linear_regression.c

Run the program and enter some sample data points:

./linear_regression

Example output:

Enter x and y coordinates (enter -1 -1 to finish):
1 2
2 4
3 5
4 4
5 5
-1 -1

Data Points Entered:
Point 1: (1.00, 2.00)
Point 2: (2.00, 4.00)
Point 3: (3.00, 5.00)
Point 4: (4.00, 4.00)
Point 5: (5.00, 5.00)

Let's break down the key components of this code:

  1. We define a DataPoint struct to store x and y coordinates.
  2. MAX_POINTS limits the number of data points to prevent overflow.
  3. The program uses a while loop to read coordinates.
  4. Users can enter data points and terminate input by entering -1 -1.
  5. The program prints out all entered data points for verification.

Compute Slope (m) and Intercept (b)

In this step, you'll learn how to compute the slope (m) and intercept (b) for linear regression using the least squares method.

First, update the previous linear_regression.c file:

cd ~/project
nano linear_regression.c

Replace the previous code with the following implementation:

#include <stdio.h>
#include <math.h>
#define MAX_POINTS 100

typedef struct {
    double x;
    double y;
} DataPoint;

// Function to compute linear regression parameters
void computeLinearRegression(DataPoint points[], int num_points, double *m, double *b) {
    double sum_x = 0, sum_y = 0, sum_xy = 0, sum_x_squared = 0;

    for (int i = 0; i < num_points; i++) {
        sum_x += points[i].x;
        sum_y += points[i].y;
        sum_xy += points[i].x * points[i].y;
        sum_x_squared += points[i].x * points[i].x;
    }

    double n = num_points;

    // Compute slope (m)
    *m = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x * sum_x);

    // Compute y-intercept (b)
    *b = (sum_y - (*m) * sum_x) / n;
}

int main() {
    DataPoint points[MAX_POINTS];
    int num_points = 0;

    printf("Enter x and y coordinates (enter -1 -1 to finish):\n");

    while (num_points < MAX_POINTS) {
        double x, y;
        scanf("%lf %lf", &x, &y);

        if (x == -1 && y == -1) {
            break;
        }

        points[num_points].x = x;
        points[num_points].y = y;
        num_points++;
    }

    double slope, intercept;
    computeLinearRegression(points, num_points, &slope, &intercept);

    printf("\nLinear Regression Results:\n");
    printf("Number of points: %d\n", num_points);
    printf("Slope (m): %.4f\n", slope);
    printf("Y-Intercept (b): %.4f\n", intercept);
    printf("Equation: y = %.4fx + %.4f\n", slope, intercept);

    return 0;
}

Compile the program with math library:

gcc -o linear_regression linear_regression.c -lm

Run the program with sample data points:

./linear_regression

Example output:

Enter x and y coordinates (enter -1 -1 to finish):
1 2
2 4
3 5
4 4
5 5
-1 -1

Linear Regression Results:
Number of points: 5
Slope (m): 0.6000
Y-Intercept (b): 2.2000
Equation: y = 0.6000x + 2.2000

Key points about the linear regression calculation:

  1. We use the least squares method to compute slope and intercept.
  2. The formula for slope is: m = (n _ Σ(xy) - Σx _ Σy) / (n * Σ(x²) - (Σx)²)
  3. The formula for y-intercept is: b = (Σy - m * Σx) / n
  4. The function computeLinearRegression() calculates these parameters
  5. The main function prints the regression equation

Print y = mx + b

In this step, you'll learn how to print the linear regression equation and predict y values using the computed slope and intercept.

Update the linear_regression.c file to add prediction functionality:

cd ~/project
nano linear_regression.c

Replace the previous code with the following implementation:

#include <stdio.h>
#include <math.h>
#define MAX_POINTS 100

typedef struct {
    double x;
    double y;
} DataPoint;

void computeLinearRegression(DataPoint points[], int num_points, double *m, double *b) {
    double sum_x = 0, sum_y = 0, sum_xy = 0, sum_x_squared = 0;

    for (int i = 0; i < num_points; i++) {
        sum_x += points[i].x;
        sum_y += points[i].y;
        sum_xy += points[i].x * points[i].y;
        sum_x_squared += points[i].x * points[i].x;
    }

    double n = num_points;

    *m = (n * sum_xy - sum_x * sum_y) / (n * sum_x_squared - sum_x * sum_x);
    *b = (sum_y - (*m) * sum_x) / n;
}

// Function to predict y value
double predictY(double m, double b, double x) {
    return m * x + b;
}

int main() {
    DataPoint points[MAX_POINTS];
    int num_points = 0;

    printf("Enter x and y coordinates (enter -1 -1 to finish):\n");

    while (num_points < MAX_POINTS) {
        double x, y;
        scanf("%lf %lf", &x, &y);

        if (x == -1 && y == -1) {
            break;
        }

        points[num_points].x = x;
        points[num_points].y = y;
        num_points++;
    }

    double slope, intercept;
    computeLinearRegression(points, num_points, &slope, &intercept);

    printf("\nLinear Regression Equation:\n");
    printf("y = %.4fx + %.4f\n", slope, intercept);

    // Print prediction for sample x values
    printf("\nPredicted y values:\n");
    double test_x_values[] = {0, 2.5, 6, 10};
    for (int i = 0; i < 4; i++) {
        double predicted_y = predictY(slope, intercept, test_x_values[i]);
        printf("When x = %.2f, y = %.4f\n", test_x_values[i], predicted_y);
    }

    return 0;
}

Compile the program:

gcc -o linear_regression linear_regression.c -lm

Run the program with sample data points:

./linear_regression

Example output:

Enter x and y coordinates (enter -1 -1 to finish):
1 2
2 4
3 5
4 4
5 5
-1 -1

Linear Regression Equation:
y = 0.6000x + 2.2000

Predicted y values:
When x = 0.00, y = 2.2000
When x = 2.50, y = 3.7000
When x = 6.00, y = 5.8000
When x = 10.00, y = 8.2000

Key points about printing the regression equation:

  1. We added a predictY() function to calculate y for any given x
  2. The main function prints the full equation: y = mx + b
  3. We demonstrate prediction by showing y values for different x inputs
  4. The output provides a clear visualization of the linear regression model

Summary

In this lab, you learned how to read (x,y) data points for linear regression analysis in C. You created a program that allows input of multiple data points and stores them for further calculation. You also learned how to print out the entered data points for verification.

Next, you will learn how to compute the slope (m) and intercept (b) of the linear regression line, and then print the equation in the form of y = mx + b.

Other C Tutorials you may like