Information Structuring in Databases

Let's briefly look at some of the processes required to codify data and turn it into meaningful information. Interesting questions about information include, "What is information?" "How does it exist in the world?" and "How are databases used to help organize, structure, record, and provide information by identifying new relationships that were previously not seen?"

There are many different definitions and philosophical concepts about information. Our short discussion will barely scratch the surface of defining information. However, we can note that information is more than just data or facts. For our purposes, let us define information as a set of data facts that (1) can be labeled, (2) are identified within a larger context, and (3) have identified relationships with other facts. For example, if we have a piece of data such as "2875039275," we do not have much information. However, if I label it as a phone number, then it begins to contain meaningful information. If I add that this phone number belongs to John Appleby, then I have added both a context and a relationship, and now we have even more available information.

Additional information is added if I identify that the number is a cell phone number; now I know that I can send a text message using that number. We also gain additional information on phone numbers by adding more contextual and relationship information. This might include whether a particular phone number is a business telephone or a residential telephone and whether it is landline (with a physical location) or cellular (with a current cell tower location).

As you can see, meaningful information is more than just a set of facts. Databases are uniquely qualified to store a large set of data facts, attach labels to them, maintain data about their context, and establish important relationships.

Entities and Attributes

Every database will contain information about many different entities. An entity is a specific type of object or type of thing. It can also be considered a classification of the specific type of items. (Note: We will use the terms entity and entity set interchangeably to refer to the set of items that are the same type.) A database schema will describe both the details about each entity and the relationships between entities. Entities usually come from physical world items such as customers or vehicles. However, abstract items, such as purchases or sales, are also entities. Detailed information about the properties of each entity are called attributes. For example, attributes of Customers in a database will be things like Name, Address, and Phone Number. The database schema will identify all the entities to be included in the database and the attributes for each entity. The schema describes the structure of the data in the database.

Figure 1.2 is an example of a visual data model of some of the information that is maintained in a database as part of the schema. This diagram is a partial database model for a shoe retail store that maintains detailed information about its customers. Information about each sale is kept in the database along with the individual pairs of shoes that are sold (SaleItems) as part of the sale. Other information includes product information (shoes) as well as the manufacturers of the shoes. As you can see in the figure, there are six different entities as identified by the boxes.

Figure 1.2: Data Entities for a Shoe Retail Store

Within each box is a list of the attributes for each entity. The attributes define the detailed information that is kept about each entity. The entities and their list of attributes are part of the database schema. In Figure 1.2, we see that attributes for Employee are EmployeeID, FirstName, LastName, Address, and so forth.

Each entity has one or more attributes that is defined as a key attribute. In the figure above, we have identified those attributes with a small key icon in the attribute rectangle. For example, for a Customer entity, the key attribute is CustomerID. A key is defined as an attribute whose value is unique or distinct for each record in the entity. Notice that a SaleItem requires three fields as the key—SaleID, ProductID, and ItemSize. It takes all three attributes to uniquely identify each SaleItem. Examples of key values are given below in the explanation of data.

Relationships

The database also needs to capture and describe information about the relationships between the entities. In this example, we want to know which sales were done by which customers. Sale Items are associated with each sale. In other words, a sale may include several pairs of shoes. Each sale item, which is a pair of shoes, is described in detail by product information. Finally, each product is manufactured by a particular manufacturer. The visual data model in Figure 1.2 is not sufficient to provide this information. Figure 1.3 expands the model by defining the important relationships. These relationships are identified by lines connecting the boxes of related entities.

Figure 1.3: Data Entities with Relationships for a Shoe Retail Store

One good way to capture relationship information is by using an entity’s key. Because keys uniquely identify a record, if we put the same key value in the data of a related record, then the two records are related to each other. In Figure 1.3 the Sale entity includes the CustomerID attribute, which is another entity that gives information about the customer. Thus, we know which customer purchased items on a particular sale. The Product entity includes the ManufacturerID attribute, which, again, is its own entity; we know who the manufacturer is for any given product. When a key attribute for one entity is placed in a different entity, it is called a foreign key in that different entity. In other words, a foreign key is a key attribute that belongs in one entity, but is listed in a different entity. In SaleItem, SaleID is a foreign key because it is the primary key for Sale. SaleItem's primary key is made up of three, and exactly three, fields: SaleID, ProductID, and ItemSize.

There is also a relationship that connects an entity with itself. This is called a recursive relationship. In the Employee entity, the manager is also an employee, so the ManagerID value must refer to an EmployeeID.

The visual data model described in this section is an easy, and powerful, technique that is used to understand the schema for a given database. Obviously, the DBMS must contain this information within the database schema so that it can organize and maintain the correct data as shown in Figure 1.3.

Data

As indicated above, the database schema describes the structure of the database. The schema is not the same as the actual data. Each one of the entities, along with its attributes, can be thought of as a database table. A database table is like a spreadsheet with the name of the spreadsheet being the entity name, the attributes being the columns of the spreadsheet, and the data being the rows. In database terminology, the rows called rows or records. Figure 1.4 illustrates this concept for the Employee entity. (Note: Click on the three dots on either side of the page to expand the page and make the table bigger.)

Figure 1.4: Employee Data

Remember, it is important not to confuse the database schema with the actual data. The schema defines the structure and the data is the actual information about each individual Employee or Establishment or Inspection.

Earlier, we introduced the idea of a key attribute. We can see from the data how the key attribute serves as the mechanism to uniquely identify each row or each record in the table. This is important because we could have two employees who have the same name or live in the same house. The key attribute solves this problem by guaranteeing that each record will have a unique value. We are familiar with this concept. Your bank account has a unique number. Your medical records are identified with a key, which may be your Social Security number. In the example, we have used a somewhat standard notation by naming the key with "ID". At times, key fields are also identified by a “_no” or “_id” as part of the name. The DBMS enforces this requirement so that no two records in the same table have the same key value.