Chapter 2 Data sources

2.1 COVID-19 Data from European Center for Disease Prevention and Control

Dataset : /COVID-19data/dataset 2/data.csv

The first data source comes from the European Center for Disease Prevention and Control, which includes the number of COVID-19 cases per 100 000 population and the 14-day notification rate of reported deaths per million population by week and country from the 1st week of 2020 (Jan.1st 2021) till the 39th week of 2021 (Oct.7th 2021).

This data set has 37036 records in total. Each row contains the corresponding data for a certain day and per country. Specifically, variables included in this data set are country related information-country, country code, continent, time information-year-week, count and statistics-weekly count, rate_14_day, cumulative, and indicator variable that distinguishes whether it’s cases or deaths-indicator.

Table 2.1: Major Columns Used
ColumnNames Description
country Country name (chr)
country_code 3 letter code for country (chr)
continent Geographical continent (chr)
population Country population (num)
indicator Wether it’s ‘cases’ or ‘deaths’ (chr)
weekly_count Weekly count of cases or deaths (num)
year_week The week number in a Year (chr)
cumulative_count Cumulative count of cases or deaths (num)

2.2 International Trade Data from United States Census Bureau

Datasets : /ft900xlsx/exh1.xlsx, exh6.xlsx, exh7.xlsx, exh8.xlsx, exh12.xlsx, exh13.xlsx, exh14.xlsx, exh14a.xlsx, exh20.xlsx, exh20a.xlsx, exh20b.xlsx

The second data source comes from the United States Census Bureau. It is a full report of the international trade the US participated in from 2020 to 2021. This report not only includes the trading time and trading amount with each trading partner of the US but also records the categories of the trading product.

There are in total 31 excel files from this source, with each presented and themed around a particular focus (such as by region, by end-use, by trade type and etc.). We discussed and decided to select 10 files that are most relevant to our investigation topics. Data files from this source are mostly represented in a tabular format. The most common variables are balance of payments (BOP) related numerals such as import, export, and balance in millions of dollars. Depending on the division basis, other variables include countries, goods categories, months and etc.

Here are the 10 files that we selected from this source:

File # File Name
exh1 U.S. International Trade in Goods and Services
exh6 U.S. Trade in Goods by Principal End-Use Category
exh7 U.S. Exports of Goods by End-Use Category and Commodity
exh8 U.S. Imports by End-Use Category and Commodity
exh12 U.S. Trade in Goods
exh13 U.S. Trade in Goods by Principal End-Use Category
exh14a U.S. Trade in Goods by Selected Countries and Areas - Prior Year
exh14 U.S. Trade in Goods by Selected Countries and Areas - Current Year
exh20 U.S. Trade in Goods and Services by Selected Countries and Areas - BOP Basis
exh20a U.S. Trade in Goods by Selected Countries and Areas - BOP Basis
exh20b U.S. Trade in Services by Selected Countries and Areas

2.3 Comments and Issues

To ensure the reliability and quality of the data, we searched and collected both of our data from either government or public institution website. Chen collected the international trade related data and Chuyang gathered the Covid related data. In searching for possible options, we tried to find data that are not only trustworthy but also comprehensive. Since we hope to take a holistic view on the Covid impact, having information that span across longer time period would be beneficial to answering our questions, so this becomes a dominant criterion in choosing our data source.

We thought there might be several issues with our data: For the first data set on Covid, although we pulled it from the authoritative ECDC website, there might still be delays or inaccuracies in recording the information. Especially for underdeveloped regions where circulation of information might not be in a timely manner. For the second source on international trade, we only obtain a subset selection of countries and the time points in certain files are quite limited. This may somehow prevent us from drawing a comparison picture in full.