Wildcards

You may sometimes need to write a SQL statement to restrict rows, but you may not know which values to return. Notice the unique values for expense_description returned from the query depicted in Figure 13.29. Suppose we want to write a query to produce the expenses that include water.

Figure 13.29: Unique Values for the expense_description Field.

Unfortunately, expenses for water appear in the same row as other items, such as cookies or supplies. The WHERE clause "WHERE expense_description = 'Water'" will not produce any results because none of the rows of the table have the exact value 'Water' (see Figure 13.30).

Figure 13.30: No Records Contain an expense_description of Exactly 'Water'.

The good news is that we can write a query that will return any rows of the table where the expense_description contains the word 'Water', even if there are other letters or words in the same value. To do this, we use a WHERE clause with wildcards.

A WHERE clause with a wildcard criterion uses the keyword LIKE and a combination of normal and wildcard characters. There are two types of wildcard characters. The first uses the '%' (percent) character, which represents any combination of characters. The second is the '_' (underscore) character, which represents a single character. The syntax for a WHERE clause with a wildcard value is WHERE fieldname LIKE 'set of normal and wildcard characters'.

We use a WHERE clause with a wildcard to write a SQL statement that will return the rows with an expense_description that contains the word 'Water'. The syntax for the WHERE clause of this query would be WHERE expense_description LIKE '%Water%' (see Figure 13.31). Notice that all of the rows in the results contain the word 'Water' somewhere in the expense_description. All other rows are excluded from the query results.

Figure 13.31: WHERE Clause with a Wildcard Criteria.

The positioning of the '%' characters is significant. Placing the '%' characters both before and after the word 'Water' means that we are looking for the characters 'Water' (in that order) anywhere in the expense_description for the rows that are included in the query result. If we were to only put the '%' before 'Water', we would be looking for the word 'Water' at the end of the expense_description (any set of characters before 'Water' and nothing after). If we were to put the '%' after 'Water', we would be looking for the word 'Water' as the first word of the expense_description (nothing before 'Water' and any set of characters after).

Consider some additional examples for using the '%' wildcard.

  • '%we' would return 'awe', but not 'awesome' or 'welcome'
  • 'we%' would return 'welcome', but not 'awe' or 'awesome'
  • '%we%' would return 'awe', 'welcome', or 'awesome'

The underscore '_' works in a similar way. The '_' character will substitute for any single character. For example, we might know that there are two members in our student organization database whose first names start with the letters 'Adel' but whose names end with different characters. We may want to construct a query to display both members. The syntax for the WHERE clause of this query would be: WHERE first_name LIKE 'Adel_' (see Figure 13.32). Notice that only those members whose first name starts with the letters 'Adel' and contains exactly one additional character are returned. If there were a member with the first name 'Adella', that person would not be in the query results because the '_' character can only substitute for a single character.

Figure 13.32: WHERE Clause with the Underscore Wildcard Character.

Examine a few more examples of using the '_' wildcard character:

  • 't_p' would return 'tip', 'tap', or 'top; but not 'stop'
  • 'p_st' would return 'past', but not 'paste'
  • 'h_ _s_n' would return 'hanson', 'hensen', or 'hansen'; but not 'harrison' (note that 'h%s_n' would return 'harrison')

Additional information and examples about using wildcard characters in the WHERE clause can be found on data.world at Basic Concepts: LIKE and NOT LIKE.