Chapter 2 Data sources
2.1 COVID-19 Data from European Center for Disease Prevention and Control
Dataset : /COVID-19data/dataset 2/data.csv
The first data source comes from the European Center for Disease Prevention and Control, which includes the number of COVID-19 cases per 100 000 population and the 14-day notification rate of reported deaths per million population by week and country from the 1st week of 2020 (Jan.1st 2021) till the 39th week of 2021 (Oct.7th 2021).
This data set has 37036 records in total. Each row contains the corresponding data for a certain day and per country. Specifically, variables included in this data set are country related information-country
, country code
, continent
, time information-year-week
, count and statistics-weekly count
, rate_14_day
, cumulative
, and indicator variable that distinguishes whether it’s cases or deaths-indicator
.
ColumnNames | Description |
---|---|
country | Country name (chr) |
country_code | 3 letter code for country (chr) |
continent | Geographical continent (chr) |
population | Country population (num) |
indicator | Wether it’s ‘cases’ or ‘deaths’ (chr) |
weekly_count | Weekly count of cases or deaths (num) |
year_week | The week number in a Year (chr) |
cumulative_count | Cumulative count of cases or deaths (num) |
2.2 International Trade Data from United States Census Bureau
Datasets : /ft900xlsx/exh1.xlsx, exh6.xlsx, exh7.xlsx, exh8.xlsx, exh12.xlsx, exh13.xlsx, exh14.xlsx, exh14a.xlsx, exh20.xlsx, exh20a.xlsx, exh20b.xlsx
The second data source comes from the United States Census Bureau. It is a full report of the international trade the US participated in from 2020 to 2021. This report not only includes the trading time and trading amount with each trading partner of the US but also records the categories of the trading product.
There are in total 31 excel files from this source, with each presented and themed around a particular focus (such as by region, by end-use, by trade type and etc.). We discussed and decided to select 10 files that are most relevant to our investigation topics. Data files from this source are mostly represented in a tabular format. The most common variables are balance of payments (BOP) related numerals such as import, export, and balance in millions of dollars. Depending on the division basis, other variables include countries, goods categories, months and etc.
Here are the 10 files that we selected from this source:
File # | File Name |
---|---|
exh1 | U.S. International Trade in Goods and Services |
exh6 | U.S. Trade in Goods by Principal End-Use Category |
exh7 | U.S. Exports of Goods by End-Use Category and Commodity |
exh8 | U.S. Imports by End-Use Category and Commodity |
exh12 | U.S. Trade in Goods |
exh13 | U.S. Trade in Goods by Principal End-Use Category |
exh14a | U.S. Trade in Goods by Selected Countries and Areas - Prior Year |
exh14 | U.S. Trade in Goods by Selected Countries and Areas - Current Year |
exh20 | U.S. Trade in Goods and Services by Selected Countries and Areas - BOP Basis |
exh20a | U.S. Trade in Goods by Selected Countries and Areas - BOP Basis |
exh20b | U.S. Trade in Services by Selected Countries and Areas |
2.3 Comments and Issues
To ensure the reliability and quality of the data, we searched and collected both of our data from either government or public institution website. Chen collected the international trade related data and Chuyang gathered the Covid related data. In searching for possible options, we tried to find data that are not only trustworthy but also comprehensive. Since we hope to take a holistic view on the Covid impact, having information that span across longer time period would be beneficial to answering our questions, so this becomes a dominant criterion in choosing our data source.
We thought there might be several issues with our data: For the first data set on Covid, although we pulled it from the authoritative ECDC website, there might still be delays or inaccuracies in recording the information. Especially for underdeveloped regions where circulation of information might not be in a timely manner. For the second source on international trade, we only obtain a subset selection of countries and the time points in certain files are quite limited. This may somehow prevent us from drawing a comparison picture in full.