Working with Date Fields in a WHERE Clause

We often use date information in WHERE clause restrictions to help us find specific rows in our data. Date information can be a little tricky to work with in a WHERE clause because it is generally stored as both date and time information. Take, for example, the event_date field on the event table in the student organization database (see Figure 13.19).

Figure 13.19: Date of the March Meeting.

Notice the date information for the March meeting. It is stored as 2020-03-10T12:00:00. This data is a combination of the day of the March meeting, 2020-03-10 (March 10, 2020), and the time of the meeting, 12:00:00 (noon). The 'T' character in the data delineates the end of the day part of the date and the beginning of the time part of the date.

Even the formatting of the day and time part of the data is somewhat cumbersome. The day part is formatted with a four-digit year, a two-digit month, and a two-digit day, all separated by a '-' character. The time element of the data is formatted with two digits for the hour, two for the minute, and two for the second, all separated with a ':' character. We could even add a decimal after the seconds to precisely store a time that includes fractions of a second.

While this format for date and time is very precise and follows an exact standard, it can be difficult to write WHERE clause restrictions that effectively select rows based on date information. Consider that we may wish to see all events that happened on March 10th of 2020 (see Figure 13.20). The logical WHERE clause SQL statement to retrieve these events would look something like: WHERE event_date = '2020-03-10' (notice that I even formatted 3/10/2020 as it appears in the event_date field on the table).

Figure 13.20: Incorrect Syntax for Date Field in WHERE Clause.

The reason for this is that in order to return a specific date in a query result, we also need to account for the particular point in time that is associated with the row we wish to see. When we omit the time part of the date from our WHERE clause condition, it is as if we are saying that we are looking for events that occurred at midnight on that date. To see an event that occurred at noon on 3/10/2020, we would need to include the time, 12:00:00, as part of the where condition. The query depicted in Figure 13.21 demonstrates how this might work.

Figure 13.21: Events that Occurred at Noon on 3/10/2020.

Of course, this approach would not return all of the events that happened on 3/10/2020. It will only return those events that happened at noon on 3/10/2020. If we wish to see all of the events that happened at any time during the day on 3/10/2020, we would need to use a WHERE clause that searches a range of values. The WHERE clause that would return all of the events that occurred at any time during the day on 3/10/2020 would be WHERE event_date >= '2020-03-10' AND event_date < '2020-03-11' (see Figure 13.22).

Figure 13.22: Events that Occurred at Any Time on 3/10/2020.

Notice the elements of this WHERE clause. Recall that when the time part of a date is omitted from a WHERE condition, it is as if we are searching for midnight on that day. The first part of the WHERE clause, event_date >= '2020-03-10', means that we are looking for events that happen at midnight or after on 3/10/2020. The last part, event_date < '2020-03-11', means that we are looking for events that happened before midnight on 3/11/2020. Therefore, the WHERE clause literally means that we want to see all events that happened at midnight on 3/10/2020 or after and before midnight on 3/11/2020.

If we wished to return all events that happened in the month of February of 2020, we might use this WHERE clause: WHERE event_date >= '2020-02-01' AND event_date < '2020-03-01' (see Figure 13.23). This statement means that we wish to see all events that happened at midnight or later on 2/1/2020 and before midnight on 3/1/2020.

Figure 13.23: Events that Occurred during February of 2020.

Using Date Functions to Simplify Working with Dates

An alternative approach to working with dates involves using predefined functions. Database management systems, including the data.world website, have a number of functions that perform calculations in queries. Some functions perform calculations on every row of the query result, and others perform summary calculations across the rows of a query result. While most of these functions are beyond the scope of this learning resource, we will discuss functions that perform calculations on dates: year, month, and day.

The syntax for using a function as part of a SQL statement is much like the syntax for using a function in Excel. We invoke a function using the name of the function and then list the parameters for the function in parentheses directly following the function name: function_name(function_arguments). A function can be used in the SELECT clause of a query or in the WHERE clause, though we will focus our instruction on using the date functions as part of the where clause.

The YEAR function is used to extract the 'year' element from a date field for each row of a query result. The year function requires a date as the only argument. The result of the YEAR function is a four-digit number that represents the year. For example, we may wish to select all events that occurred during the year 2019 (see Figure 13.24). The syntax for the WHERE clause for this SQL statement would be WHERE YEAR(event_date) = 2019. Notice that the only events returned occurred during 2019.

Figure 13.24: Events that Occurred in 2019.

The MONTH function extracts the 'month' part of a date field for each row of a query result. The MONTH function also requires a date as the argument. The result of the MONTH function is a two-digit number that represents the month. For example, we may wish to select all events that occurred during the month of November (see Figure 13.25). The syntax for the WHERE clause for this SQL statement would be: WHERE MONTH(event_date) = 11. In this case, all of the events returned will have occurred in November (month = 11).

Figure 13.25: Events that Occurred in November.

The 'day' part of a date is extracted using the DAY function. The DAY function also requires a date as the argument. The result of the DAY function is a two-digit number that represents the day. Consider that we may wish to select all events that occurred on the 12th day of the month (see Figure 13.26). The syntax for the WHERE clause for this SQL statement would be WHERE DAY(event_date) = 12. All of the events returned by this SQL statement occurred on the 12th day of the month. Notice that while each of these events occurred on the 12th of the month, they occurred during different months and even different years.

Figure 13.26: Events that Occurred on the 12th.

If we wanted to limit our results to those events that occurred on a specific date, we must use a compound WHERE clause. If we wanted to return those events that happened on March 10, 2020, we would use the following syntax in the SQL statement: WHERE MONTH(event_date) = 3 AND DAY(event_date) = 10 AND YEAR(event_date) = 2020 (see Figure 13.27). Notice that this will provide the same result as the following WHERE clause: WHERE event_date >= '2020-03-10' and event_date < '2020-03-11'.

Figure 13.27: Events that Occurred on 3/10/2020.

We can use the date functions (year, month, and day) in any combination to return results that contain any dates we wish. If we wanted to see those events that occurred during the month of February of 2020, the SQL statement would be WHERE MONTH(event_date) = 2 AND YEAR(event_date) = 2020 (see Figure 13.28).

Figure 13.28: Events that Occurred during February of 2020.

Working with dates values in a WHERE clause condition can be a bit tedious. In this section, we described two approaches to creating conditions with dates. As you write WHERE clause conditions that involve date values, you will likely find that one of these approaches is more comfortable than the other. Either approach should prove to be equally effective.