To check for linearity in a linear regression model, you can use the following methods:
-
Scatter Plot:
- Create a scatter plot of the independent variable(s) against the dependent variable. If the points form a linear pattern (a straight line), it suggests a linear relationship.
import matplotlib.pyplot as plt # Assuming X is the independent variable and y is the dependent variable plt.scatter(X, y) plt.xlabel('Independent Variable') plt.ylabel('Dependent Variable') plt.title('Scatter Plot to Check Linearity') plt.show() -
Residual Plot:
- After fitting a linear regression model, plot the residuals (the differences between observed and predicted values) against the predicted values or the independent variable. If the residuals are randomly scattered around zero without any discernible pattern, it indicates linearity.
import seaborn as sns import statsmodels.api as sm model = sm.OLS(y, X).fit() residuals = model.resid predicted = model.predict(X) plt.scatter(predicted, residuals) plt.axhline(0, color='red', linestyle='--') plt.xlabel('Predicted Values') plt.ylabel('Residuals') plt.title('Residual Plot to Check Linearity') plt.show() -
Correlation Coefficient:
- Calculate the correlation coefficient (e.g., Pearson correlation) between the independent and dependent variables. A value close to +1 or -1 indicates a strong linear relationship.
import numpy as np correlation = np.corrcoef(X.flatten(), y)[0, 1] print(f'Correlation Coefficient: {correlation}') -
Statistical Tests:
- Use statistical tests like the Ramsey RESET test to formally test for linearity. This test checks whether the model is correctly specified.
By using these methods, you can assess whether a linear relationship exists between your variables before proceeding with linear regression analysis.
