In Python

Basic APIs

Python has several good libraries for calling web APIs to collect data. We will start by using the requests package (the same one used for web scraping earlier).

Let’s begin by using some free, basic APIs that do not require any authentication, header, parameters, or data. We’ll learn all of those things as we go along. For now, select one of the free basic APIs below. But keep in mind that because these are free and open, public APIs, they aren't always reliable, and sometimes they go down for a while. You get what you pay for. If one of them doesn't work, simply try another one.

  • https://api.coindesk.com/v1/bpi/currentprice.json -> View the Bitcoin Price Index (BPI) in real-time.

  • https://cat-fact.herokuapp.com/facts -> Get random cat facts via text message every day.

  • https://api.quotable.io -> Quotable is a free, open source quotations API.

  • https://dog.ceo/api/breeds/image/random -> Cheer yourself up with random dog images.

  • https://ipinfo.io/161.185.160.93/geo -> Get information about a specified IP address, such as geological info, company, and carrier name.

  • https://official-joke-api.appspot.com/random_joke -> Get random programming jokes.

  • https://randomuser.me/api/ -> Get information about a random fake user, including gender, name, email, address, etc.

  • https://api.zippopotam.us/us/33162 -> Get information about a specified ZIP code.

  • http://api.open-notify.org -> Get information about the International Space Station.

  • https://isro.vercel.app/api/spacecrafts -> Get a list of ISRO launched spacecrafts and rockets.

Let’s use the first one in our example below:

        import requests

        # '.get' refers to the type of request: GET, POST, or many others but those two are most common
        response = requests.get("https://api.coindesk.com/v1/bpi/currentprice.json")
        print(response.status_code)

        # Output:
        # 200
        

Let’s break this down a bit. First, what do we mean by 'get'? Is there a reason for using that method in particular? Actually, yes. There are many types of requests, and each API specifies what request method it uses. While GET and POST are the most common, here is a complete list:

  • GET: An API method for retrieving information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data.

  • HEAD: An API method for retrieving information from the given server using a given URI (same as GET) but that transfers the status line and header section only.

  • POST: An API method that sends data (e.g., customer information and file upload) to the server using HTML forms.

  • PUT: An API method that replaces all current representations of the target resource with the uploaded content.

  • DELETE: An API method that removes all current representations of the target resource given by a URI.

  • CONNECT: An API method that establishes a tunnel to the server identified by a given URI.

  • OPTIONS: An API method that describes the communication options for the target resource.

  • TRACE: An API method that performs a message loop-back test along the path to the target resource.

Next, what does the output '200' mean? It is a status code indicating whether the request was successfully completed and why. Here is a short list of codes:

  • 200: Everything went okay, and the result has been returned (if any).

  • 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names or an endpoint name is changed.

  • 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.

  • 401: The server thinks you’re not authenticated. Many APIs require login credentials, so this happens when you don’t send the right credentials to access an API.

  • 403: The resource you’re trying to access is forbidden: you don’t have the right permissions to see it.

  • 404: The resource you tried to access wasn’t found on the server.

  • 503: The server is not ready to handle the request.

With that background, what did that API actually return? To see the contents of the data returned, just use the .text property of the response:

        response.text

        # Output:
        # {"time":{"updated":"May 10, 2024 16:39:43 UTC","updatedISO":"2024-05-10T16:39:43+00:00","updateduk":"May 10, 2024 at 17:39 BST"},"disclaimer":"This data was produced from the CoinDesk Bitcoin Price Index (USD). Non-USD currency data converted using hourly conversion rate from openexchangerates.org","chartName":"Bitcoin","bpi":{"USD":{"code":"USD","symbol":"$","rate":"60,944.13","description":"United States Dollar","rate_float":60944.1297},"GBP":{"code":"GBP","symbol":"£","rate":"48,653.466","description":"British Pound Sterling","rate_float":48653.4661},"EUR":{"code":"EUR","symbol":"€","rate":"56,575.472","description":"Euro","rate_float":56575.4716}}}
        

That’s quite a jumble of data. You may recognize the format by now as JSON. If not, JSON is simply a format for organizing data that is basically a dictionary that can be nested with variables (objects, ints, floats, booleans) or collections (lists, tuples, or more dictionaries). The dictionary can also be jagged, meaning that each key and value pair can be different. For example, one value may be a list, and the next value could be a dictionary. One value may go several lists or dictionaries deep, while the next does not. Python has a JSON package that can help us organize it more clearly:

        import json
        json_data = json.loads(response.text)
        json_formatted = json.dumps(json_data, indent=2)
        print(json_formatted)

        # Output:
        # {
        #   "time": {
        #     "updated": "May 10, 2024 16:39:43 UTC",
        #     "updatedISO": "2024-05-10T16:39:43+00:00",
        #     "updateduk": "May 10, 2024 at 17:39 BST"
        #   },
        #   "disclaimer": "This data was produced from the CoinDesk Bitcoin Price Index (USD). Non-USD currency data converted using hourly conversion rate from openexchangerates.org",
        #   "chartName": "Bitcoin",
        #   "bpi": {
        #     "USD": {
        #       "code": "USD",
        #       "symbol": "$",
        #       "rate": "60,944.13",
        #       "description": "United States Dollar",
        #       "rate_float": 60944.1297
        #     },
        #     "GBP": {
        #       "code": "GBP",
        #       "symbol": "£",
        #       "rate": "48,653.466",
        #       "description": "British Pound Sterling",
        #       "rate_float": 48653.4661
        #     },
        #     "EUR": {
        #       "code": "EUR",
        #       "symbol": "€",
        #       "rate": "56,575.472",
        #       "description": "Euro",
        #       "rate_float": 56575.4716
        #     }
        #   }
        # }
        

Okay, that helps. JSON is basically a complex Python dictionary of nested lists and dictionaries in the form of key/value pairs. For example, the first label "time" is a key, while the dictionary of date/time objects that comes on the other side of the ":" symbol is the value of that key/value pair. Next, "disclaimer", "chartName", and "bpi" are three more keys in the overall dictionary. They each have different values; the first two have object values and "bpi" has another dictionary as the value. Within "bpi", the dictionary value has three key/value pairs; each with their own sub-dictionary as values. In summary, the overall returned JSON object is three dictionary levels-deep.

Our goal is to extract the data from this JSON object that is relevant to us. In this case, we want to convert the data for the code, symbol, rate, description, and rate_float for each country's bitcoin conversion rate into a Pandas DataFrame table that can be stored and exported into a .CSV file. Let's break this process down one step at a time so that it makes sense. Let's begin by drilling down to the "bpi" key/value pair:

        json_data['bpi']
        
        # Output:
        # {'USD': {'code': 'USD',
        #   'symbol': '$',
        #   'rate': '60,944.13',
        #   'description': 'United States Dollar',
        #   'rate_float': 60944.1297},
        #  'GBP': {'code': 'GBP',
        #   'symbol': '£',
        #   'rate': '48,653.466',
        #   'description': 'British Pound Sterling',
        #   'rate_float': 48653.4661},
        #  'EUR': {'code': 'EUR',
        #   'symbol': '€',
        #   'rate': '56,575.472',
        #   'description': 'Euro',
        #   'rate_float': 56575.4716}}
        

Do you see how the results above represent the dictionary that comes afer the ":" to the right of the "bpi" key in the original JSON dictionary? Hopefully that makes sense. Next, let's iterate through each of the three rates and print out the five attributes of each record. But let's break that down into two steps. First, let's print out the key of each of the three key/value pairs:

        for entry in json_data['bpi']:
          print(entry)
        
        # Output:
        # USD
        # GBP
        # EUR
        

Next, in order to print out the attributes (code, symbol, rate, description, and rate_float), we will need to reference the iterator "entry" in context:

        for entry in json_data['bpi']:
          print(json_data['bpi'][entry])
        
        # Output:
        # {'code': 'USD', 'symbol': '$', 'rate': '60,944.13', 'description': 'United States Dollar', 'rate_float': 60944.1297}
        # {'code': 'GBP', 'symbol': '£', 'rate': '48,653.466', 'description': 'British Pound Sterling', 'rate_float': 48653.4661}
        # {'code': 'EUR', 'symbol': '€', 'rate': '56,575.472', 'description': 'Euro', 'rate_float': 56575.4716}
        

Now that we know how to get to the sub-dictionaries we want, we can extend our code a bit further to get only the values within each dictionary:

        for entry in json_data['bpi']:
          print(json_data['bpi'][entry]['code'], 
                json_data['bpi'][entry]['symbol'], 
                json_data['bpi'][entry]['rate'], 
                json_data['bpi'][entry]['description'],
                json_data['bpi'][entry]['rate_float'])
        
        # Output:
        # USD $ 60,944.13 United States Dollar 60944.1297
        # GBP £ 48,653.466 British Pound Sterling 48653.4661
        # EUR € 56,575.472 Euro 56575.4716
        

Now that we know how to drill down to only the unique feature values within each row, let's put it all together into a Pandas DataFrame:

        import pandas as pd
        df = pd.DataFrame(columns=['code', 'symbol', 'rate', 'description', 'rate_float'])
        df.set_index('code', inplace=True)
        
        # Add each bitcoin exchange rate to the new DataFrame one-at-a-time
        for entry in json_data['bpi']:
          df.loc[json_data['bpi'][entry]['code']] = [json_data['bpi'][entry]['symbol'], 
                                                     json_data['bpi'][entry]['rate'], 
                                                     json_data['bpi'][entry]['description'], 
                                                     json_data['bpi'][entry]['rate_float']]
        
        # Save the results to a .CSV file and print the DataFrame
        df.to_csv('bitcoin_price.csv')
        df.head()
        

Now we have a dataset that we can work with. It only has three rows, but the code is dynamic and will work whether there are 3 or 3000+ rows.

Endpoints

You may have noticed a term used above: endpoint. The type of APIs we are using here is a specific type referred to as REST web services. Web services have a URL (e.g., https://www.domainname.com) and one or more endpoints (e.g., https://www.domainname.com/api/endpointname). An is a specific functionality offered by a RESTful web service. Each endpoint has its own name, which is simply attached to the web service URL, thus giving each endpoint its own location.

Every web service API will have its own set of documentation that details each available endpoint, the method for calling each endpoint (e.g., GET or POST), inputs required, and outputs delivered. You will see these details in action through the rest of this topic. Let's try another of the web services from the initial list at the top of this page: https://isro.vercel.app/api. This API has four endpoints that provide data about spacecrafts, spacecraft launchers, customer satellites, and ISRO Centres. Let's use each endpoint below:

        # Returns a list of ISRO launched spacecrafts and rockets
        response = requests.get("https://isro.vercel.app/api/spacecrafts")
        print(response.json())
        
        # Returns a list of ISRO launchers
        response = requests.get("https://isro.vercel.app/api/launchers")
        print(response.json())
        
        # Returns a list of ISRO customer satellites
        response = requests.get("https://isro.vercel.app/api/customer_satellites")
        print(response.json())
        
        # Returns a list of ISRO Centres
        response = requests.get("https://isro.vercel.app/api/centres")
        print(response.json())
        
        # Output:
        # {'spacecrafts': [{'id': 1, 'name': 'Aryabhata'}, {'id': 2, 'name': 'Bhaskara-I'}, {'id': 3, 'name': 'Rohini Technology Payload (RTP)'} ...
        # {'launchers': [{'id': 'SLV-3E1'}, {'id': 'SLV-3E2'}, {'id': 'SLV-3D1'}, {'id': 'SLV-3'}, {'id': 'ASLV-D1'}, {'id': 'ASLV-D2'}, {'id':  ...
        # {'customer_satellites': [{'id': 'DLR-TUBSAT', 'country': 'Germany', 'launch_date': '26-05-1999', 'mass': '45', 'launcher': 'PSLV-C2'}, ...
        # {'centres': [{'id': 1, 'name': 'Semi-Conductor Laboratory (SCL)', 'Place': 'Chandigarh', 'State': 'Punjab/Haryana'}, {'id': 2, 'name': ...
        

Before moving on, I want to draw your attention to something we did in that last code block that was a bit different. Notice that we didn't call the json.loads() function to convert the string returned in the response object into JSON. This is because the requests package has a function in the response object called .json() which does that for us automatically. I see both techniques used in practice so I want you to be familiar with them.

Parameters (Querystring)

Web services can be customized in a few ways. Querystring parameters are one of those ways. Parameters are placed in the URL of the web service call by adding a '?' symbol at the end of the URL and then one to many key/value pairs formatted with '&' between each pair: key=value&key=value&key=value. Let’s use some examples below of more free web services that require parameters:

        # Predict the age of a person based on their name
        response = requests.get("https://api.agify.io?name=homer")
        print(response.text)

        # Predict the gender of a person based on their name
        response = requests.get("https://api.genderize.io?name=homer")
        print(response.text)

        # https://api.nationalize.io?name=nathaniel
        response = requests.get("https://api.nationalize.io?name=homer")
        print(response.text)

        # https://datausa.io/api/data?drilldowns=Nation&measures=Population
        response = requests.get("https://datausa.io/api/data?drilldowns=Nation&measures=Population")
        print(response.text)

        # https://api.ipify.org?format=json
        response = requests.get("https://api.ipify.org?amp;format=json")
        print(response.text)

        # http://universities.hipolabs.com/search?country=United+States
        response = requests.get("http://universities.hipolabs.com/search?country=United+States")
        print(response.text)

        # Output:
        # {"name":"homer","age":71,"count":2043}
        # {"name":"homer","gender":"male","probability":0.97,"count":2075}
        # {"name":"homer","country":[{"country_id":"PH","probability":0.423783697420402},{"country_id":"US","probability":0.15056758473883325},{"country_id":"IT","probability":0.1322296682437725}]}
        # {"data":[{"ID Nation":"01000US","Nation":"United States","ID Year":2018,"Year":"2018","Population":327167439,"Slug Nation":"united-states"},...,"name":"acs_yg_total_population_1","substitutions":[]}]}
        # {"ip":"34.125.95.36"}
        # [{"web_pages": ["http://www.marywood.edu"], "domains": ["marywood.edu"], "state-province": null, "country": "United States", "alpha_two_code": "US", "name": "Marywood University"}, ... ]