COVID-19 Tracking Example

Next, let’s practice what we covered. The following endpoint provides data about COVID-19 cases, deaths, and testing: https://api.covidtracking.com/v2/us/daily.json. Write the code to complete the following three tasks:

  1. Request the endpoint to retrieve the data and print in the clean JSON format.

  2. Extract the column names from the field_definitions sub-dictionary and add them to a new DataFrame (without data yet).

  3. Create a reduced version of that DataFrame with only the date and change_from_prior_day column data. In the same code cell, fill in missing values with zero and sort the rows by the date field ascending.

  4. Create a bar chart from the column data.

Search Google and Stack Overflow for advice on any tasks you struggle with. Then, see our solution below:

Step 1: Request the endpoint to retrieve the data and print in the clean JSON format.

      import json, requests, pandas as pd

      response = requests.get("https://api.covidtracking.com/v2/us/daily.json")
      json_data = json.loads(response.text)
      clean = json.dumps(json_data, indent=2)
      print(clean)

      # Output:
      # 
      # {
      #   "links": {
      #     "self": "https://api.covidtracking.com/us/daily"
      #   },
      #   "meta": {
      #     "build_time": "2021-06-01T07:03:25.055Z",
      #     "license": "CC-BY-4.0",
      #     "version": "2.0-beta",
      #     "field_definitions": [
      #       {
      #         "name": "Total test results",
      #         "field": "tests.pcr.total",
      #         "deprecated": false,
      #         "prior_names": [
      #           "totalTestResults"
      #         ]
      #       },
      #       {
      #         "name": "Hospital discharges",
      #         "deprecated": false,
      #         "prior_names": []
      #       },
      #       ...       
      

Step 2: Extract the column names from the field_definitions sub-dictionary and add them to a new DataFrame (without data yet).

      response = requests.get("https://api.covidtracking.com/v2/us/daily.json")
      json_data = json.loads(response.text)
        
      for k, v in json_data.items():
        print(k)
        
      print(json_data['meta'])  # prints everything inside the 'meta' key
      print(json_data['meta']['field_definitions']) # prints list of columns available in the dataset
        
      # create a column list
      columns = []
      for col in json_data['meta']['field_definitions']:
        columns.append(col['name'])
        
      df = pd.DataFrame(columns=columns)
      df
        
      # Output:
      # links
      # meta
      # data
      # {'build_time': '2021-06-01T07:03:25.055Z', 'license': 'CC-BY-4.0', 'version': '2.0-beta', 'field_definitions': [{'name': 'Total test resu...
      # [{'name': 'Total test results', 'field': 'tests.pcr.total', 'deprecated': False, 'prior_names': ['totalTestResults']}, {'name': 'Hospital...
      

Step 3: Create a reduced version of that DataFrame with only the date and change_from_prior_day column data. In the same code cell, fill in missing values with zero and sort the rows by the date field ascending.

      pd.set_option('future.no_silent_downcasting', True) # This is needed to suppress a warning for future versions of pandas
          
      print(json_data['data']) # prints the data without the column names

      df_reduced = pd.DataFrame(columns=['Date', 'Confirmed Cases'])
        
      for date in json_data['data']:
        df_reduced.loc[len(df_reduced)] = [date['date'], date['cases']['total']['calculated']['change_from_prior_day']]
        
      df_reduced.fillna(0, inplace=True)
      df_reduced.sort_values(by=['Date'], inplace=True)
      df_reduced
      

Step 4: Create a bar chart from the column data.

      import seaborn as sns
      import numpy as np
      from matplotlib import pyplot as plt
    
      sns.set(rc={'figure.figsize': (30, 7)})
      sns.set_style("whitegrid")
    
      # Create the barplot
      ax = sns.barplot(x=df_reduced['Date'].tail(150), y=df_reduced['Confirmed Cases'].tail(150))
    
      # Rotate x-axis labels correctly
      ax.set_xticks(range(len(df_reduced['Date'].tail(150))))  # Set the tick positions
      ax.set_xticklabels(df_reduced['Date'].tail(150), rotation=90)  # Set the tick labels with rotation
    
      plt.show()