Information Structuring in Databases

Every database has a database schema. A database schema defines the structure and organization of the database and the information that it contains. Hence, in order for information to be captured within a database, it must be analyzed and made to fit within an organized structure that is consistent with the purposes of the database.

For example, let's assume that we want to provide information about certain people in a database. Clearly, all the information about a person is incredibly complex. So, our first step is to define the purpose of the database and determine what information is relevant. The next step is to determine what pieces of information about that person are relevant for this particular database. Those pieces of information are called attributes. Finally, one of the most difficult steps is to determine the relationships that are connected to a person in the database. The following list identifies a few different types of databases and the types of attributes and relationships that might be captured in each. Notice that each of these items listed below refers to the attributes of a "person," yet the list of attributes is distinct.

  • Customer database: Attributes include name, address, telephone number, account balance, credit rating, credit card number, and so on. Relationships may include details of purchases or details of payments.

  • Movie star database: Attributes include stage name, real name, birthday, age, spouse name, and so on. Relationships may include details of movie or TV roles, previous marriages, or a gallery of photos.

  • University professor database: Attributes include name, address, telephone number, office number, office phone number, years of work, title, department, college, salary, and so on. Relationships may include details of publications, details of university committees, awards, current classes, previous classes, or student ratings.

  • Student database: Attributes include name, address, telephone number, total hours completed, current registered hours, total GPA, overall standing, and so on. Relationships may include details of courses and grades, current registered courses, library materials, or campus credit balance.

The above examples of diverse groups of people are rather simple. In many databases, attributes and relationships are easy to define, primarily due to the focused nature of the database. However, as we try to capture and provide information about diverse and complex elements of the real world, it can become difficult and challenging to determine how to best structure the information. Normally it is not just one entity that needs to be described, but an entire set of different, and often diverse, entities that need to be described and their relationships identified. For example, looking at the public databases in the previous section, how does one accurately describe the world's economic data, or detail the genomes and DNA sequences of all invertebrates? Without question, designing those databases must have required extensive understanding of real world contexts.

Even though the designing and structuring of complex data into meaningful information can be a challenging problem, it also has tremendous benefits. The mere process of trying to impose structure onto diverse sets of data helps us to understand the nature of data. Databases provide a tremendous benefit to our understanding of the real world as data is structured and codified. Figure 1.1 illustrates a sample database structure to maintain information about an incident such as an automobile accident or a workplace-related occurrence. Notice how many different pieces of information must be captured for a simple concept such as an "incident."

Figure 1.1: Sample "Incident" Database Structure

Let's briefly look at some of the processes required to codify data and turn it into meaningful information. Interesting questions about information include the following:

  • What is information?

  • How does it exist in the world?

  • How are databases used to help organize, structure, record, and provide information by identifying new relationships that were not seen previously?

There are many different definitions and philosophical concepts about information. Our short discussion will barely scratch the surface of defining information. However, we can note that information is more than just data or facts. For our purposes, let us define information as a set of data facts that (1) can be labeled, (2) are identified within a larger context, and (3) have identified relationships with other facts. For example, if we have a piece of data such as "2875039275," then we clearly do not have much information. However, if we label the data as a phone number, then the data begins to contain meaningful information. If I add that this phone number belongs to John Appleby, then I have added both a context and a relationship and even more information is available to me.

Additional information is added if I identify that it is a cell phone number; now I know that I can send a text message to that number. We also gain additional information on phone numbers by adding more contextual and relationship information. This might include whether a particular phone number is a business telephone or a residential telephone, and whether the phone is a landline (with a physical location), or cellular (with a current cell tower location).

As you can see, meaningful information is more than just a set of facts. Databases are uniquely qualified to store a large set of data facts, attach labels to them, maintain data about the context, and establish important relationships.