1.3 Interesting Public Databases
One of the benefits of powerful database technology coupled with widespread connectivity is that some organizations are collecting large amounts of data and are making that data available to the general public free of charge. Frequently, queries can be done with a simple input web form. However, more complex queries, which investigate more esoteric issues, must be performed using customized queries that use an API (application program interface). These databases usually contain large amounts of data, which contributes to their value as well as to the complexity of extracting information.
The two major elements in the development of these very large databases were (1) a robust, comprehensive, integrated, complex design of the data structures and data items, and (2) sophisticated methods to populate the database with large amounts of data. As you learn more about database design and database queries, you will appreciate the skills required to develop and populate these databases. Below, we mention just a few:
-
Annotated Human Genome Data: This is a large database (310 GB) of genome information for humans and about 50 other species. The database has several different methods to access the data, including a simple web interface, export via FTP, MySQL server queries, Perl API, and even a data-mining tool. Access this data on the Ensembl website.
-
Federal Reserve Economic Data: This database contains timelines of economic data for the US, and includes 61,000 different time series. Access to the data is by website or customized API. The entire database can also be downloaded on the FRED (Federal Reserve Bank of St. Louis) website.
-
US Labor, Economic, and Census Data: Various departments of the US government provide large datasets containing information about the US economy, plus labor, commerce, housing, and census data. These databases can be accessed through a browser as well as customized APIs and downloads. Various departments include the U.S. Bureau of Labor Statistics website and the United States Census Bureau website.
-
Historical Weather Datasets: The National Oceanic and Atmospheric Administration (NOAA) provides time series data and climatological data in several different databases. The various databases provide different series of data. The data can be accessed via a browser interface as well as via specific tools that are provided by the administration. Access this information on the NOAA website or use this link to go directly to various databases: Quick Links to Databases.
-
The Library of Congress: This massive database contains 133 Terabytes of compressed data. A single search could take up to 24 hours to complete. The Library of Congress also has numerous interesting databases containing information on a variety of subjects. Although there is extensive information, it is available through website interfaces on the Library of Congress website rather than through specialized tools or APIs.
-
The CIA World Factbook: This massive database provides information on history, people, societies, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities. It includes a variety of world, regional, country, oceananic, and time zone maps; flags of the world; and a country comparison function that ranks country information and data in more than 75 Factbook fields. Access this information on The World Factbook website.