Introduction
In this lab, we will learn how to compute the Pearson correlation coefficient in C. The lab covers three main steps: reading paired (x,y) data, computing the necessary sums, and using the formula to calculate the correlation coefficient. We will create a C program that allows users to input data points, and then the program will perform the correlation analysis and output the result.
The lab provides a step-by-step guide, starting with the implementation of the data input functionality, followed by the calculation of the sums required for the correlation formula, and finally, the printing of the correlation coefficient.
Read Paired (x,y) Data
In this step, we will learn how to read paired (x,y) data for calculating the Pearson correlation coefficient in C. We'll create a program that allows users to input paired numerical data and store it for further analysis.
First, let's create a C source file for our data input functionality:
cd ~/project
nano correlation_input.c
Now, add the following code to the file:
#include <stdio.h>
#define MAX_POINTS 100
int main() {
double x[MAX_POINTS], y[MAX_POINTS];
int n, i;
printf("Enter the number of data points (max %d): ", MAX_POINTS);
scanf("%d", &n);
printf("Enter x and y coordinates:\n");
for (i = 0; i < n; i++) {
printf("Point %d (x y): ", i + 1);
scanf("%lf %lf", &x[i], &y[i]);
}
printf("\nData Points Entered:\n");
for (i = 0; i < n; i++) {
printf("Point %d: (%.2f, %.2f)\n", i + 1, x[i], y[i]);
}
return 0;
}
Compile the program:
gcc -o correlation_input correlation_input.c
Run the program and enter some sample data:
./correlation_input
Example output:
Enter the number of data points (max 100): 5
Enter x and y coordinates:
Point 1 (x y): 1 2
Point 2 (x y): 2 4
Point 3 (x y): 3 5
Point 4 (x y): 4 4
Point 5 (x y): 5 5
Data Points Entered:
Point 1: (1.00, 2.00)
Point 2: (2.00, 4.00)
Point 3: (3.00, 5.00)
Point 4: (4.00, 4.00)
Point 5: (5.00, 5.00)
Let's break down the code:
- We define a maximum number of data points (MAX_POINTS) to prevent memory overflow.
- The program prompts the user to enter the number of data points.
- It then allows the user to input x and y coordinates for each point.
- Finally, it prints out the entered data points to confirm input.
Compute Sums and Use Formula for Correlation
In this step, we will extend our previous program to compute the necessary sums for calculating the Pearson correlation coefficient. We'll modify the correlation_input.c file to include calculations for the correlation formula.
Open the previous file:
cd ~/project
nano correlation_input.c
Update the code with the following implementation:
#include <stdio.h>
#include <math.h>
#define MAX_POINTS 100
double calculatePearsonCorrelation(double x[], double y[], int n) {
double sum_x = 0, sum_y = 0, sum_xy = 0;
double sum_x_squared = 0, sum_y_squared = 0;
// Compute necessary sums
for (int i = 0; i < n; i++) {
sum_x += x[i];
sum_y += y[i];
sum_xy += x[i] * y[i];
sum_x_squared += x[i] * x[i];
sum_y_squared += y[i] * y[i];
}
// Pearson correlation coefficient formula
double numerator = n * sum_xy - sum_x * sum_y;
double denominator = sqrt((n * sum_x_squared - sum_x * sum_x) *
(n * sum_y_squared - sum_y * sum_y));
return numerator / denominator;
}
int main() {
double x[MAX_POINTS], y[MAX_POINTS];
int n, i;
printf("Enter the number of data points (max %d): ", MAX_POINTS);
scanf("%d", &n);
printf("Enter x and y coordinates:\n");
for (i = 0; i < n; i++) {
printf("Point %d (x y): ", i + 1);
scanf("%lf %lf", &x[i], &y[i]);
}
double correlation = calculatePearsonCorrelation(x, y, n);
printf("\nData Points Entered:\n");
for (i = 0; i < n; i++) {
printf("Point %d: (%.2f, %.2f)\n", i + 1, x[i], y[i]);
}
printf("\nPearson Correlation Coefficient: %.4f\n", correlation);
return 0;
}
Compile the program with math library:
gcc -o correlation_input correlation_input.c -lm
Run the program with sample data:
./correlation_input
Example output:
Enter the number of data points (max 100): 5
Enter x and y coordinates:
Point 1 (x y): 1 2
Point 2 (x y): 2 4
Point 3 (x y): 3 5
Point 4 (x y): 4 4
Point 5 (x y): 5 5
Data Points Entered:
Point 1: (1.00, 2.00)
Point 2: (2.00, 4.00)
Point 3: (3.00, 5.00)
Point 4: (4.00, 4.00)
Point 5: (5.00, 5.00)
Pearson Correlation Coefficient: 0.8528
Key points about the Pearson correlation calculation:
- We compute necessary sums: x, y, xy, x², y²
- Apply the Pearson correlation coefficient formula
- Use sqrt() from math.h for calculation
- Return the correlation coefficient between -1 and 1
Print the Correlation Coefficient
In this final step, we'll enhance our program to provide a comprehensive interpretation of the Pearson correlation coefficient and create a more user-friendly output.
Open the previous file:
cd ~/project
nano correlation_input.c
Update the code with the following implementation:
#include <stdio.h>
#include <math.h>
#define MAX_POINTS 100
double calculatePearsonCorrelation(double x[], double y[], int n) {
double sum_x = 0, sum_y = 0, sum_xy = 0;
double sum_x_squared = 0, sum_y_squared = 0;
for (int i = 0; i < n; i++) {
sum_x += x[i];
sum_y += y[i];
sum_xy += x[i] * y[i];
sum_x_squared += x[i] * x[i];
sum_y_squared += y[i] * y[i];
}
double numerator = n * sum_xy - sum_x * sum_y;
double denominator = sqrt((n * sum_x_squared - sum_x * sum_x) *
(n * sum_y_squared - sum_y * sum_y));
return numerator / denominator;
}
void interpretCorrelation(double correlation) {
printf("\nCorrelation Coefficient Interpretation:\n");
printf("Correlation Value: %.4f\n", correlation);
if (correlation > 0.8) {
printf("Strong Positive Correlation\n");
} else if (correlation > 0.5) {
printf("Moderate Positive Correlation\n");
} else if (correlation > 0.3) {
printf("Weak Positive Correlation\n");
} else if (correlation > -0.3) {
printf("No Linear Correlation\n");
} else if (correlation > -0.5) {
printf("Weak Negative Correlation\n");
} else if (correlation > -0.8) {
printf("Moderate Negative Correlation\n");
} else {
printf("Strong Negative Correlation\n");
}
}
int main() {
double x[MAX_POINTS], y[MAX_POINTS];
int n, i;
printf("Pearson Correlation Coefficient Calculator\n");
printf("----------------------------------------\n");
printf("Enter the number of data points (max %d): ", MAX_POINTS);
scanf("%d", &n);
printf("Enter x and y coordinates:\n");
for (i = 0; i < n; i++) {
printf("Point %d (x y): ", i + 1);
scanf("%lf %lf", &x[i], &y[i]);
}
double correlation = calculatePearsonCorrelation(x, y, n);
printf("\nData Points Entered:\n");
for (i = 0; i < n; i++) {
printf("Point %d: (%.2f, %.2f)\n", i + 1, x[i], y[i]);
}
interpretCorrelation(correlation);
return 0;
}
Compile the program:
gcc -o correlation_calculator correlation_input.c -lm
Run the program with sample data:
./correlation_calculator
Example output:
Pearson Correlation Coefficient Calculator
----------------------------------------
Enter the number of data points (max 100): 5
Enter x and y coordinates:
Point 1 (x y): 1 2
Point 2 (x y): 2 4
Point 3 (x y): 3 5
Point 4 (x y): 4 4
Point 5 (x y): 5 5
Data Points Entered:
Point 1: (1.00, 2.00)
Point 2: (2.00, 4.00)
Point 3: (3.00, 5.00)
Point 4: (4.00, 4.00)
Point 5: (5.00, 5.00)
Correlation Coefficient Interpretation:
Correlation Value: 0.8528
Strong Positive Correlation
Key improvements:
- Added
interpretCorrelation()function - Provides detailed explanation of correlation strength
- Categorizes correlation into different levels
- Enhanced user interface with a title and clear output
Summary
In this lab, we learned how to read paired (x,y) data for calculating the Pearson correlation coefficient in C. We created a program that allows users to input paired numerical data and store it for further analysis. We also extended the program to compute the necessary sums for calculating the Pearson correlation coefficient using the formula.
The key steps covered in this lab include reading paired (x,y) data, computing the sums required for the correlation formula, and printing the final correlation coefficient. By following these steps, you can implement the Pearson correlation calculation in your own C programs.



