COVID-19 Data Analysis with Python

PythonBeginner
Practice Now

Introduction

In this project, you will learn how to analyze COVID-19 data using Python. The COVID-19 pandemic has had a significant impact on the world, and understanding the data is crucial for tracking the spread of the virus and making informed decisions.

👀 Preview

{
  "Confirmed": {
    "Africa": 1203094,
    "America": 6396173,
    "Asia": 6480321,
    "Europe": 3450299,
    "Oceania": 27346,
    "Others": 721,
    "Total": 17557954
  },
  "Deaths": {
    "Africa": 28289,
    "America": 254610,
    "Asia": 133186,
    "Europe": 206438,
    "Oceania": 576,
    "Others": 15,
    "Total": 623114
  },
  "Recovered": {
    "Africa": 930536,
    "America": 5087347,
    "Asia": 5163062,
    "Europe": 1927545,
    "Oceania": 21892,
    "Others": 651,
    "Total": 13131033
  },
  "Active": {
    "Africa": 244269,
    "America": 1054216,
    "Asia": 1184073,
    "Europe": 1316316,
    "Oceania": 4878,
    "Others": 55,
    "Total": 3803807
  }
}

🎯 Tasks

In this project, you will learn:

  • How to set up the development environment and install the required Python libraries
  • How to understand the structure and content of the COVID-19 data
  • How to implement a function to convert country names to continent names
  • How to process the COVID-19 data and calculate the summary statistics for each continent
  • How to test the code and verify the output

🏆 Achievements

After completing this project, you will be able to:

  • Understand how to work with CSV data in Python
  • Implement functions to process and analyze data
  • Convert data between different formats (e.g., CSV to JSON)
  • Gain experience in data analysis and visualization
  • Contribute to the understanding of the COVID-19 pandemic through data-driven insights

Prepare the Environment

In this step, you will learn how to set up the environment for the COVID-19 data analysis project.

  1. Open the terminal and navigate to the /home/labex/project directory.
  2. Install the required Python libraries by running the following command:
python3 -m pip install pandas country-converter

This will install the pandas and country-converter libraries, which are needed for the project.

✨ Check Solution and Practice

Understand the Data

The COVID-19 data is provided in a CSV file located in the /home/labex/project directory. The file contains the following columns:

  • Country_Region: The name of the country or region.
  • Confirmed: The total number of confirmed COVID-19 cases.
  • Deaths: The total number of COVID-19 deaths.
  • Recovered: The total number of COVID-19 recoveries.
  • Active: The total number of active COVID-19 cases.

Your task is to process this data and calculate the summary statistics for each continent.

✨ Check Solution and Practice

Implement the country_to_continent Function

The first step is to create a function that can convert a country name to its corresponding continent name. Create a new file named covid.py in the /home/labex/project directory and add the following code:

import country_converter as coco

def country_to_continent(country_name):
    """This function takes a country name and returns the continent name."""
    try:
        ## Convert country name to continent name
        if country_name == "Diamond Princess" or country_name == "MS Zaandam":
            return "Others"
        country_continent_name = coco.convert(names=country_name, to="continent")
        ## If country name is not found, return 'Others'
        if country_continent_name == "not found":
            return "Others"
        return country_continent_name
    except:
        return "Others"

This function uses the country-converter library to convert a country name to its corresponding continent name. If the country name is not found, it returns "Others".

✨ Check Solution and Practice

Implement the count Function

Next, you need to implement the count function, which will process the COVID-19 data and return the summary statistics for each continent. Add the following code to the covid.py file:

import json
import pandas as pd

def count(data):
    """This function takes a file path and returns the total number of
    confirmed, deaths, recovered and active cases for each continent."""
    ## Read the data from the file
    df = pd.read_csv(data)
    ## Fill missing values with 0
    df.fillna(0, inplace=True)
    ## Remove rows with missing values
    df = df[df["Confirmed"] == df["Deaths"] + df["Recovered"] + df["Active"]]
    ## Convert country name to continent name
    df["Continent"] = df["Country_Region"].apply(country_to_continent)
    ## Convert data type to int
    df[["Confirmed", "Deaths", "Recovered", "Active"]] = df[
        ["Confirmed", "Deaths", "Recovered", "Active"]
    ].astype(int)
    ## Select columns of interest and change them into dictionary
    df = df[["Continent", "Confirmed", "Deaths", "Recovered", "Active"]]
    result = df.groupby("Continent").sum().to_dict()
    ## Add total for each continent
    for key in result.keys():
        result[key]["Total"] = sum(result[key].values())
    return json.dumps(result)

This function reads the COVID-19 data from the CSV file, processes the data, and returns the summary statistics for each continent in JSON format.

✨ Check Solution and Practice

Test the Code

To test the code, you can run the following command in the terminal:

python3 covid.py

This will execute the count function and print the resulting JSON data to the console.

✨ Check Solution and Practice

Verify the Output

The output of the count function should be a JSON string that looks similar to the following:

{
  "Confirmed": {
    "Africa": 1203094,
    "America": 6396173,
    "Asia": 6480321,
    "Europe": 3450299,
    "Oceania": 27346,
    "Others": 721,
    "Total": 17557954
  },
  "Deaths": {
    "Africa": 28289,
    "America": 254610,
    "Asia": 133186,
    "Europe": 206438,
    "Oceania": 576,
    "Others": 15,
    "Total": 623114
  },
  "Recovered": {
    "Africa": 930536,
    "America": 5087347,
    "Asia": 5163062,
    "Europe": 1927545,
    "Oceania": 21892,
    "Others": 651,
    "Total": 13131033
  },
  "Active": {
    "Africa": 244269,
    "America": 1054216,
    "Asia": 1184073,
    "Europe": 1316316,
    "Oceania": 4878,
    "Others": 55,
    "Total": 3803807
  }
}

This output represents the summary statistics for each continent, including the total number of confirmed cases, deaths, recoveries, and active cases.

Congratulations! You have completed the COVID-19 data analysis project. If you have any questions or issues, please feel free to ask.

✨ Check Solution and Practice

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.