COVID-19 Data Analysis with Python

PythonPythonBeginner
Practice Now

Introduction

In this project, you will learn how to analyze COVID-19 data using Python. The COVID-19 pandemic has had a significant impact on the world, and understanding the data is crucial for tracking the spread of the virus and making informed decisions.

👀 Preview

{
  "Confirmed": {
    "Africa": 1203094,
    "America": 6396173,
    "Asia": 6480321,
    "Europe": 3450299,
    "Oceania": 27346,
    "Others": 721,
    "Total": 17557954
  },
  "Deaths": {
    "Africa": 28289,
    "America": 254610,
    "Asia": 133186,
    "Europe": 206438,
    "Oceania": 576,
    "Others": 15,
    "Total": 623114
  },
  "Recovered": {
    "Africa": 930536,
    "America": 5087347,
    "Asia": 5163062,
    "Europe": 1927545,
    "Oceania": 21892,
    "Others": 651,
    "Total": 13131033
  },
  "Active": {
    "Africa": 244269,
    "America": 1054216,
    "Asia": 1184073,
    "Europe": 1316316,
    "Oceania": 4878,
    "Others": 55,
    "Total": 3803807
  }
}

ðŸŽŊ Tasks

In this project, you will learn:

  • How to set up the development environment and install the required Python libraries
  • How to understand the structure and content of the COVID-19 data
  • How to implement a function to convert country names to continent names
  • How to process the COVID-19 data and calculate the summary statistics for each continent
  • How to test the code and verify the output

🏆 Achievements

After completing this project, you will be able to:

  • Understand how to work with CSV data in Python
  • Implement functions to process and analyze data
  • Convert data between different formats (e.g., CSV to JSON)
  • Gain experience in data analysis and visualization
  • Contribute to the understanding of the COVID-19 pandemic through data-driven insights

Prepare the Environment

In this step, you will learn how to set up the environment for the COVID-19 data analysis project.

  1. Open the terminal and navigate to the /home/labex/project directory.
  2. Install the required Python libraries by running the following command:
python3 -m pip install pandas country-converter

This will install the pandas and country-converter libraries, which are needed for the project.

Understand the Data

The COVID-19 data is provided in a CSV file located in the /home/labex/project directory. The file contains the following columns:

  • Country_Region: The name of the country or region.
  • Confirmed: The total number of confirmed COVID-19 cases.
  • Deaths: The total number of COVID-19 deaths.
  • Recovered: The total number of COVID-19 recoveries.
  • Active: The total number of active COVID-19 cases.

Your task is to process this data and calculate the summary statistics for each continent.

Implement the country_to_continent Function

The first step is to create a function that can convert a country name to its corresponding continent name. Create a new file named covid.py in the /home/labex/project directory and add the following code:

import country_converter as coco

def country_to_continent(country_name):
    """This function takes a country name and returns the continent name."""
    try:
        ## Convert country name to continent name
        if country_name == "Diamond Princess" or country_name == "MS Zaandam":
            return "Others"
        country_continent_name = coco.convert(names=country_name, to="continent")
        ## If country name is not found, return 'Others'
        if country_continent_name == "not found":
            return "Others"
        return country_continent_name
    except:
        return "Others"

This function uses the country-converter library to convert a country name to its corresponding continent name. If the country name is not found, it returns "Others".

Implement the count Function

Next, you need to implement the count function, which will process the COVID-19 data and return the summary statistics for each continent. Add the following code to the covid.py file:

import json
import pandas as pd

def count(data):
    """This function takes a file path and returns the total number of
    confirmed, deaths, recovered and active cases for each continent."""
    ## Read the data from the file
    df = pd.read_csv(data)
    ## Fill missing values with 0
    df.fillna(0, inplace=True)
    ## Remove rows with missing values
    df = df[df["Confirmed"] == df["Deaths"] + df["Recovered"] + df["Active"]]
    ## Convert country name to continent name
    df["Continent"] = df["Country_Region"].apply(country_to_continent)
    ## Convert data type to int
    df[["Confirmed", "Deaths", "Recovered", "Active"]] = df[
        ["Confirmed", "Deaths", "Recovered", "Active"]
    ].astype(int)
    ## Select columns of interest and change them into dictionary
    df = df[["Continent", "Confirmed", "Deaths", "Recovered", "Active"]]
    result = df.groupby("Continent").sum().to_dict()
    ## Add total for each continent
    for key in result.keys():
        result[key]["Total"] = sum(result[key].values())
    return json.dumps(result)

This function reads the COVID-19 data from the CSV file, processes the data, and returns the summary statistics for each continent in JSON format.

Test the Code

To test the code, you can run the following command in the terminal:

python3 covid.py

This will execute the count function and print the resulting JSON data to the console.

Verify the Output

The output of the count function should be a JSON string that looks similar to the following:

{
  "Confirmed": {
    "Africa": 1203094,
    "America": 6396173,
    "Asia": 6480321,
    "Europe": 3450299,
    "Oceania": 27346,
    "Others": 721,
    "Total": 17557954
  },
  "Deaths": {
    "Africa": 28289,
    "America": 254610,
    "Asia": 133186,
    "Europe": 206438,
    "Oceania": 576,
    "Others": 15,
    "Total": 623114
  },
  "Recovered": {
    "Africa": 930536,
    "America": 5087347,
    "Asia": 5163062,
    "Europe": 1927545,
    "Oceania": 21892,
    "Others": 651,
    "Total": 13131033
  },
  "Active": {
    "Africa": 244269,
    "America": 1054216,
    "Asia": 1184073,
    "Europe": 1316316,
    "Oceania": 4878,
    "Others": 55,
    "Total": 3803807
  }
}

This output represents the summary statistics for each continent, including the total number of confirmed cases, deaths, recoveries, and active cases.

Congratulations! You have completed the COVID-19 data analysis project. If you have any questions or issues, please feel free to ask.

âœĻ Check Solution and Practice

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Python Tutorials you may like