Pandas DataFrame Corr Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, we will learn how to use the corr() method in the pandas library to calculate the correlation between columns in a DataFrame. Correlation is a measure of the linear relationship between two variables, and it helps us understand how changes in one variable affect another.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL pandas(("`Pandas`")) -.-> pandas/DataSelectionGroup(["`Data Selection`"]) python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) pandas/DataSelectionGroup -.-> pandas/select_columns("`Select Columns`") python/BasicConceptsGroup -.-> python/booleans("`Booleans`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/DataStructuresGroup -.-> python/dictionaries("`Dictionaries`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") python/DataScienceandMachineLearningGroup -.-> python/data_visualization("`Data Visualization`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills pandas/select_columns -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/booleans -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/lists -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/tuples -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/dictionaries -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/importing_modules -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/numerical_computing -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/data_analysis -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/data_visualization -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} python/build_in_functions -.-> lab-68599{{"`Pandas DataFrame Corr Method`"}} end

Importing Required Libraries

First, we need to import the necessary libraries. In this case, we only need the pandas library.

import pandas as pd

Create a DataFrame

Next, let's create a DataFrame to work with. We will create a simple DataFrame with columns representing people's names, ages, heights, and weights.

chart = {
    'Name':['Chetan','yashas','yuvraj'],
    'Age':  [20, 25, 30],
    'Height': [155, 160, 175],
    'Weight': [55, 60, 75]
}

df = pd.DataFrame(chart)

Calculate the Correlation

Now, we can calculate the correlation between the columns of the DataFrame using the corr() method. We can provide an optional method parameter to specify the correlation method to be used (pearson, kendall, or spearman). If no method is specified, it defaults to the Pearson correlation.

Let's calculate the Pearson correlation between the columns of our DataFrame:

pearson_corr = df.corr(method='pearson')
print("Pearson Correlation:")
print(pearson_corr)

Visualize the Correlation

We can visualize the correlation matrix using a heatmap. The seaborn library provides a convenient way to create heatmaps.

import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(pearson_corr, annot=True, cmap='coolwarm')
plt.title("Pearson Correlation Heatmap")
plt.show()

Calculate Correlation with other methods

We can also calculate the correlation using the Kendall or Spearman methods. To do this, simply specify the method parameter accordingly. Let's calculate the Kendall correlation of our DataFrame:

kendall_corr = df.corr(method='kendall')
print("Kendall Correlation:")
print(kendall_corr)

Visualize the Correlation Heatmap with other methods

Similarly, we can create a heatmap to visualize the Kendall and Spearman correlations:

sns.heatmap(kendall_corr, annot=True, cmap='coolwarm')
plt.title("Kendall Correlation Heatmap")
plt.show()

Repeat the Process with Spearman Correlation

Lastly, let's calculate and visualize the Spearman correlation:

spearman_corr = df.corr(method='spearman')
print("Spearman Correlation:")
print(spearman_corr)
sns.heatmap(spearman_corr, annot=True, cmap='coolwarm')
plt.title("Spearman Correlation Heatmap")
plt.show()

Summary

In this lab, we learned how to calculate and visualize the correlation between columns of a DataFrame using the corr() method in pandas. We explored different correlation methods, including Pearson, Kendall, and Spearman, and used heatmaps to visualize the correlation matrices. Correlation analysis helps us identify relationships between variables and is useful in many areas, such as data analysis, machine learning, and finance.

Other Python Tutorials you may like