Introduction
In this project, you will learn how to analyze COVID-19 data using Python. The COVID-19 pandemic has had a significant impact on the world, and understanding the data is crucial for tracking the spread of the virus and making informed decisions.
👀 Preview
{
"Confirmed": {
"Africa": 1203094,
"America": 6396173,
"Asia": 6480321,
"Europe": 3450299,
"Oceania": 27346,
"Others": 721,
"Total": 17557954
},
"Deaths": {
"Africa": 28289,
"America": 254610,
"Asia": 133186,
"Europe": 206438,
"Oceania": 576,
"Others": 15,
"Total": 623114
},
"Recovered": {
"Africa": 930536,
"America": 5087347,
"Asia": 5163062,
"Europe": 1927545,
"Oceania": 21892,
"Others": 651,
"Total": 13131033
},
"Active": {
"Africa": 244269,
"America": 1054216,
"Asia": 1184073,
"Europe": 1316316,
"Oceania": 4878,
"Others": 55,
"Total": 3803807
}
}
🎯 Tasks
In this project, you will learn:
- How to set up the development environment and install the required Python libraries
- How to understand the structure and content of the COVID-19 data
- How to implement a function to convert country names to continent names
- How to process the COVID-19 data and calculate the summary statistics for each continent
- How to test the code and verify the output
🏆 Achievements
After completing this project, you will be able to:
- Understand how to work with CSV data in Python
- Implement functions to process and analyze data
- Convert data between different formats (e.g., CSV to JSON)
- Gain experience in data analysis and visualization
- Contribute to the understanding of the COVID-19 pandemic through data-driven insights
Prepare the Environment
In this step, you will learn how to set up the environment for the COVID-19 data analysis project.
- Open the terminal and navigate to the
/home/labex/projectdirectory. - Install the required Python libraries by running the following command:
python3 -m pip install pandas country-converter
This will install the pandas and country-converter libraries, which are needed for the project.
Understand the Data
The COVID-19 data is provided in a CSV file located in the /home/labex/project directory. The file contains the following columns:
Country_Region: The name of the country or region.Confirmed: The total number of confirmed COVID-19 cases.Deaths: The total number of COVID-19 deaths.Recovered: The total number of COVID-19 recoveries.Active: The total number of active COVID-19 cases.
Your task is to process this data and calculate the summary statistics for each continent.
Implement the country_to_continent Function
The first step is to create a function that can convert a country name to its corresponding continent name. Create a new file named covid.py in the /home/labex/project directory and add the following code:
import country_converter as coco
def country_to_continent(country_name):
"""This function takes a country name and returns the continent name."""
try:
## Convert country name to continent name
if country_name == "Diamond Princess" or country_name == "MS Zaandam":
return "Others"
country_continent_name = coco.convert(names=country_name, to="continent")
## If country name is not found, return 'Others'
if country_continent_name == "not found":
return "Others"
return country_continent_name
except:
return "Others"
This function uses the country-converter library to convert a country name to its corresponding continent name. If the country name is not found, it returns "Others".
Implement the count Function
Next, you need to implement the count function, which will process the COVID-19 data and return the summary statistics for each continent. Add the following code to the covid.py file:
import json
import pandas as pd
def count(data):
"""This function takes a file path and returns the total number of
confirmed, deaths, recovered and active cases for each continent."""
## Read the data from the file
df = pd.read_csv(data)
## Fill missing values with 0
df.fillna(0, inplace=True)
## Remove rows with missing values
df = df[df["Confirmed"] == df["Deaths"] + df["Recovered"] + df["Active"]]
## Convert country name to continent name
df["Continent"] = df["Country_Region"].apply(country_to_continent)
## Convert data type to int
df[["Confirmed", "Deaths", "Recovered", "Active"]] = df[
["Confirmed", "Deaths", "Recovered", "Active"]
].astype(int)
## Select columns of interest and change them into dictionary
df = df[["Continent", "Confirmed", "Deaths", "Recovered", "Active"]]
result = df.groupby("Continent").sum().to_dict()
## Add total for each continent
for key in result.keys():
result[key]["Total"] = sum(result[key].values())
return json.dumps(result)
This function reads the COVID-19 data from the CSV file, processes the data, and returns the summary statistics for each continent in JSON format.
Test the Code
To test the code, you can run the following command in the terminal:
python3 covid.py
This will execute the count function and print the resulting JSON data to the console.
Verify the Output
The output of the count function should be a JSON string that looks similar to the following:
{
"Confirmed": {
"Africa": 1203094,
"America": 6396173,
"Asia": 6480321,
"Europe": 3450299,
"Oceania": 27346,
"Others": 721,
"Total": 17557954
},
"Deaths": {
"Africa": 28289,
"America": 254610,
"Asia": 133186,
"Europe": 206438,
"Oceania": 576,
"Others": 15,
"Total": 623114
},
"Recovered": {
"Africa": 930536,
"America": 5087347,
"Asia": 5163062,
"Europe": 1927545,
"Oceania": 21892,
"Others": 651,
"Total": 13131033
},
"Active": {
"Africa": 244269,
"America": 1054216,
"Asia": 1184073,
"Europe": 1316316,
"Oceania": 4878,
"Others": 55,
"Total": 3803807
}
}
This output represents the summary statistics for each continent, including the total number of confirmed cases, deaths, recoveries, and active cases.
Congratulations! You have completed the COVID-19 data analysis project. If you have any questions or issues, please feel free to ask.
Summary
Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.



