Introduction
In this tutorial, we will learn how to use the DataFrame.cov() method in the pandas library to compute the covariance between columns in a DataFrame. The covariance measures the relationship between two random variables and indicates how much they vary together.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Create a DataFrame
First, let's create a DataFrame with some sample data. We will use the pd.DataFrame() function to create a DataFrame object.
import pandas as pd
data = {'Name': ['Chetan', 'Yashas', 'Yuvraj'],
'Age': [20, 25, 30],
'Height': [155, 170, 165],
'Weight': [59, 60, 75]}
df = pd.DataFrame(data)
print(df)
Compute Covariance Matrix
Next, we can use the DataFrame.cov() method to compute the covariance matrix of the columns in the DataFrame. The covariance matrix is a matrix in which each entry represents the covariance between two columns.
covariance_matrix = df.cov()
print(covariance_matrix)
Compute Covariance of Two Columns
If we are interested in computing the covariance between two specific columns, we can do so by accessing those columns and applying the cov() method to them directly.
covariance = df['Height'].cov(df['Weight'])
print(covariance)
Summary
In this tutorial, we learned how to use the DataFrame.cov() method in pandas to compute the covariance between columns in a DataFrame. We also saw how to compute the covariance matrix of all column pairs and how to compute the covariance between two specific columns. The covariance can help us understand the relationship between different measures across time or any other data points.