2.4 COVID-19 Tracking Example
Next, let’s practice what we covered. The following endpoint provides data about COVID-19 cases, deaths, and testing: https://api.covidtracking.com/v2/us/daily.json. Write the code to complete the following three tasks:
-
Request the endpoint to retrieve the data and print in the clean JSON format.
-
Extract the column names from the field_definitions sub-dictionary and add them to a new DataFrame (without data yet).
-
Create a reduced version of that DataFrame with only the date and change_from_prior_day column data. In the same code cell, fill in missing values with zero and sort the rows by the date field ascending.
-
Create a bar chart from the column data.
Search Google and Stack Overflow for advice on any tasks you struggle with. Then, see our solution below:
Step 1: Request the endpoint to retrieve the data and print in the clean JSON format.
import json, requests, pandas as pd
response = requests.get("https://api.covidtracking.com/v2/us/daily.json")
json_data = json.loads(response.text)
clean = json.dumps(json_data, indent=2)
print(clean)
# Output:
#
# {
# "links": {
# "self": "https://api.covidtracking.com/us/daily"
# },
# "meta": {
# "build_time": "2021-06-01T07:03:25.055Z",
# "license": "CC-BY-4.0",
# "version": "2.0-beta",
# "field_definitions": [
# {
# "name": "Total test results",
# "field": "tests.pcr.total",
# "deprecated": false,
# "prior_names": [
# "totalTestResults"
# ]
# },
# {
# "name": "Hospital discharges",
# "deprecated": false,
# "prior_names": []
# },
# ...
Step 2: Extract the column names from the field_definitions sub-dictionary and add them to a new DataFrame (without data yet).
response = requests.get("https://api.covidtracking.com/v2/us/daily.json")
json_data = json.loads(response.text)
for k, v in json_data.items():
print(k)
print(json_data['meta']) # prints everything inside the 'meta' key
print(json_data['meta']['field_definitions']) # prints list of columns available in the dataset
# create a column list
columns = []
for col in json_data['meta']['field_definitions']:
columns.append(col['name'])
df = pd.DataFrame(columns=columns)
df
# Output:
# links
# meta
# data
# {'build_time': '2021-06-01T07:03:25.055Z', 'license': 'CC-BY-4.0', 'version': '2.0-beta', 'field_definitions': [{'name': 'Total test resu...
# [{'name': 'Total test results', 'field': 'tests.pcr.total', 'deprecated': False, 'prior_names': ['totalTestResults']}, {'name': 'Hospital...
Step 3: Create a reduced version of that DataFrame with only the date and change_from_prior_day column data. In the same code cell, fill in missing values with zero and sort the rows by the date field ascending.
pd.set_option('future.no_silent_downcasting', True) # This is needed to suppress a warning for future versions of pandas
print(json_data['data']) # prints the data without the column names
df_reduced = pd.DataFrame(columns=['Date', 'Confirmed Cases'])
for date in json_data['data']:
df_reduced.loc[len(df_reduced)] = [date['date'], date['cases']['total']['calculated']['change_from_prior_day']]
df_reduced.fillna(0, inplace=True)
df_reduced.sort_values(by=['Date'], inplace=True)
df_reduced
Step 4: Create a bar chart from the column data.
import seaborn as sns
import numpy as np
from matplotlib import pyplot as plt
sns.set(rc={'figure.figsize': (30, 7)})
sns.set_style("whitegrid")
# Create the barplot
ax = sns.barplot(x=df_reduced['Date'].tail(150), y=df_reduced['Confirmed Cases'].tail(150))
# Rotate x-axis labels correctly
ax.set_xticks(range(len(df_reduced['Date'].tail(150)))) # Set the tick positions
ax.set_xticklabels(df_reduced['Date'].tail(150), rotation=90) # Set the tick labels with rotation
plt.show()