1.3. Expressions

An expression in SQL is a thing that can be evaluated - anything that results in a value. Some examples include literal values, operator expressions, and function call expressions. Expressions are used in most clauses of a SQL query; for example, in the SELECT clause, expressions result in the values we see returned from the query, while in the WHERE clause, expressions determine whether or not a row is returned from the query. You can also ORDER BY expressions, and later we will see other uses for expressions. This chapter will explore some of the most common expression types; additional expressions will be introduced in later chapters.

1.3.1. Tables used in this chapter

We will again be working with the simple_books and simple_authors tables for this chapter. Reminder: you can read a full explanation of these tables in Appendix A.

1.3.2. Column expressions

The use of a column name in a SQL statement produces a special expression which evaluates to the value stored in that column for the current row being processed. So, when we run

the query execution examines each row of the table simple_books in turn to evaluate the expression author = 'J.R.R. Tolkien'. This expression compares the value of the author column to the literal value 'J.R.R. Tolkien' using the = operator. If the two are the same, the overall expression evaluates to True, and the row is included in the output; otherwise, the row is excluded.

1.3.3. Literals

Literals are simple values expressed in a form that the database recognizes and understands. There are only a few basic types of literals in SQL, although these can be converted to many different types within a database. We will discuss SQL data types further in Chapter 1.6. The main literals you will encounter are:

  • Numbers: these are expressed in the usual fashion, for example, -1, 3.14159, 0.0008. Depending on the database, you may also be able to use numeric literals in scientific notation or other formats, for instance, 6.02e23 (which stands for \(6.02 \times 10^{23}\)).

  • Character strings: these are strings of characters enclosed in single quotes, for example, 'apple'. If you need to express a literal character string which contains a single quote, you simply write the single quote twice; this is tricky to read, but produces the desired result. This is shown in the following query:

  • Boolean values: True or False. Note, however, that not all SQL implementations support Boolean literals.

  • Date and time values. The accepted notations for dates and times vary widely among different SQL implementations.

  • The special value NULL. We will talk more about NULL below.

You can ask for literal expressions in the SELECT clause - this is sometimes useful. In this case, the literal is evaluated as itself for each row in the table you are querying. For example:

SELECT 42, 'hello', author FROM simple_books;

If you try this query in the interactive tool above, note that the output provides column names based on the literal expressions selected. Later we will see how to change the names of columns in the output if we want to make them more meaningful.

1.3.4. Operators and functions

SQL defines a number of useful operations on its various types. Some of these use simple operators, as in mathematical expressions, while others take the form of functions. Appendix B provides extensive lists of the operators and functions defined by the SQL standard, but we will discuss some of the most commonly used ones here, along with examples of their use.

1.3.4.1. Comparison operators

We’ve already seen the equality operator (=) used to test if some column is equal to a literal value in the WHERE clause of queries. We could instead test for inequality using the (<>) operator:

Though it is non-standard, most databases also recognize != as an inequality operator. (Note that SQL does not use ==, which is used to test for equality in many programming languages. While SQLite does recognize it as an equality comparison operator, do not use it, as it will be a difficult habit to break.)

We can also test to see if a value is less than (<), greater than (>), less than or equal to (<=), or greater than or equal to (>=) some other value. There is also a ternary operator, BETWEEN, that tests if a value is between two other values (see Appendix B - Comparison operators for details).

1.3.4.2. Mathematics

You can expect the basic arithmetic operators to work with any numeric values: addition (+), subtraction (-), multiplication (*), and division (/) are standard. Your database may implement others, but make sure you read the documentation for your database to ensure other operators do what you think they do. You can actually use your database as a simple calculator! Try running these:

(Note for Oracle users: Oracle requires all SELECT queries to have a FROM clause, so the special table dual is provided for queries that use no columns and return one row. Thus, use SELECT 4 + 7 FROM dual; in Oracle.)

The SQL standard additionally provides functions for many useful mathematical operations, such as logarithms (log, ln, log10), exponentials (exp), square root (sqrt), modulus (mod), floor and ceiling (floor, ceiling or ceil), trigonometric functions (sin, cos, etc.), and more. Some examples:

SELECT sqrt(3);
SELECT log10(1e5);
SELECT cos(0);

You will most likely find yourself using mathematical operators in SQL if you are working with numerical data such as financial or scientific records. In Chapter 1.6 we will discuss some of the different data types available for storing numbers: integers, decimal numbers, and floating point values. Each has applications to various problems.

As a somewhat contrived example of applying mathematical operators to an actual table, consider the problem of finding out which century a book was published in. In the English language, the 1st century is traditionally considered to be the years numbered 1 - 100. Each subsequent 100 years adds 1 to the century, so the 20th century included the years 1901 - 2000.

With a little math, we can extract the century in which each book in our database was published:

SELECT
  title,
  floor((publication_year + 99) / 100) AS century
FROM simple_books;

Note the use of parentheses to enforce an order of operations: the addition operation occurs before the division, and then the result of the division is provided to the floor() function. We have also introduced something new - a renaming operation to give our result column a more informative name. The AS keyword lets us rename a column in the output of our query. We will learn more about using AS in Chapter 1.4.

See Appendix B - Mathematical operators and functions for a complete list of standard operators and functions.

1.3.4.3. Character string operators and functions

SQL provides two very useful string operators. The operator || (two vertical bars) is used for string concatenation. There are many instances in which we want to append one string to another. For example, if we do not like the multi-column output from our simple_books table, we could use string concatenation to produce a more familiar representation of the data:

(Note about implementations: in SQL Server, you will need to use + instead of ||; in MySQL, you will need to use the MySQL concat function, e.g. SELECT concat(title, ', by ', author) FROM simple_books;.)

The LIKE operator is a Boolean operator that is used almost exclusively in the WHERE clause. LIKE provides very simple pattern matching capabilities in SQL. A pattern is just a string that can contain regular text and special wildcard characters, which can match one or many unspecified characters. The two wildcards are %, which can match any string of zero or more characters, and _, which can match exactly one of any character. Normal text matches itself exactly. (If you are familiar with standard regular expression syntax, the % wildcard corresponds to “.*” as used in a regular expression, and the _ wildcard corresponds to “.”.)

Consider the case in which we recall the first name of an author, but not the full name, and wish to look up authors with that first name. The % wildcard can be used here to stand in for the unknown part of the name:

SELECT name FROM simple_authors WHERE name LIKE 'Isabel %';

Since the % can match any string, the pattern 'Isabel %' would match “Isabel Allende”, “Isabel Granada”, or “Isabel del Puerto” for example (only one of these is in our simple_authors table, though).

Similarly, if we remember the last part of the name, but not the start, we can use the % wildcard again:

SELECT name FROM simple_authors WHERE name LIKE '% Ginsberg';

We can even use the wildcard more than once:

SELECT title FROM simple_books WHERE title LIKE '%Earth%';

Now, suppose we are interested in authors who use an initial instead of their full first name. An initial looks like some single character followed by a period - both are required. Here’s what the query would look like, using both the % and _ wildcards:

SELECT name FROM simple_authors WHERE name LIKE '_.%';

In addition to these operators, SQL provides a number of useful functions that act on character strings. The functions upper and lower convert strings to all uppercase or lowercase characters, respectively. Not all languages distinguish between uppercase and lowercase, of course, so these functions may not be applicable in certain locales. You can use upper or lower whenever you want to get back strings in all uppercase or lowercase:

SELECT upper(title), author FROM simple_books;

You can also use them when pattern matching if you aren’t sure of the capitalization of the strings in your database:

SELECT * FROM simple_books WHERE lower(title) LIKE '%love%';

SQL also provides functions for tasks such as substring extraction or replacement, finding the location of a substring, trimming whitespace (or other characters) from the front and/or back of a string, and many more. See Appendix B - Character string operators and functions for these.

1.3.4.4. Boolean operators

As discussed in Chapter 1.2, the WHERE clause of a SELECT query expects a Boolean expression after the WHERE keyword. Some expressions that are Boolean in SQL include expressions using comparison operators, or an expression using the LIKE operator. Many functions also result in a Boolean value.

SQL provides logical operators that operate on Boolean values. These operators are AND, OR, and NOT, which perform the logical operations that their names imply. For example, if we have an expression of the form expr1 AND expr2, the result is True if and only if both expr1 and expr2 evaluate to True. Similarly, expr1 OR expr2 evaluates to True if at least one of expr1 and expr2 are True. Finally, NOT inverts the truth value: NOT True results in False, and NOT False results in True.

These logical operators allow us to build up complex Boolean expressions from simpler Boolean expressions to express the particular logical conditions we want for our WHERE clause. So, for example, we might be interested in fantasy books published since the year 2000:

Or, we might be interested in books in either the fantasy or science fiction genres:

SELECT * FROM simple_books
WHERE genre = 'fantasy' OR genre = 'science fiction';

If we simply hate science fiction, we might write

SELECT * FROM simple_books WHERE NOT genre = 'science fiction';

which gives the same result as

SELECT * FROM simple_books WHERE genre <> 'science fiction';

For more complex expressions involving combinations of AND, OR, and NOT, we may need to use parentheses to make our meaning clear. In SQL, NOT is applied before AND, and AND is applied before OR. For example, perhaps we are interested in any books other than fantasy books published after the year 2000. We might be tempted to write

SELECT * FROM simple_books
WHERE NOT genre = 'fantasy' AND publication_year > 2000;

However, this isn’t quite right (try it!). Since the NOT is applied first, this query returns books that a) are not fantasy and b) were published since the year 2000. The expression NOT genre = 'fantasy' AND publication_year > 2000 is equivalent to (NOT genre = 'fantasy') AND (publication_year > 2000). To get what we originally wanted, we need to use parentheses explicitly:

SELECT * FROM simple_books
WHERE NOT (genre = 'fantasy' AND publication_year > 2000);

You can see that the above query only excludes books in the list:

SELECT * FROM simple_books
WHERE genre = 'fantasy' AND publication_year > 2000;

Similarly, we might be interested in either science fiction or fantasy books, but only if they were published after 2000. Compare the two queries below:

SELECT *
FROM simple_books
WHERE genre = 'science fiction' OR genre = 'fantasy'
AND publication_year > 2000;

SELECT *
FROM simple_books
WHERE
    (genre = 'science fiction' OR genre = 'fantasy')
    AND publication_year > 2000;

The first of these queries returns any science fiction books, along with fantasy books published after 2000. The second returns the desired result: books published after 2000 in either the fantasy or science fiction genres.

For a fuller discussion of Boolean operators, we need to know more about NULL values, which will be discussed below. See Appendix B - Boolean operators for complete documentation on the SQL Boolean operators.

1.3.4.5. Date and time operators and functions

Date and time data are extremely important in many database applications, such as those supporting governmental or financial institutions. SQL provides extensive functionality for managing dates and times. Unfortunately, this is an area where different SQL implementations vary widely in their conformance to the SQL standard. See Appendix B - Date and time operators and functions for a fuller discussion, and consult your database implementation’s documentation to see what capabilities it offers with respect to date and time handling.

One useful SQL function that most databases implement is the CURRENT_DATE function (also try CURRENT_TIME and CURRENT_TIMESTAMP):

We will see in Chapter 1.6 how this function can be used to automatically record the date in a newly created row.

1.3.5. NULL

In many database applications, it is sometimes necessary to record the absence of information on some aspect of a piece of data. For example, in querying our authors table, we can see that some entries in the death column are blank. This probably means that the author for that row had not yet died at the time the data was entered, and thus the column was simply not applicable for that author; there is no death date. Additionally, some birth dates are blank; in this case, the column certainly applies to the author - they were clearly born at some point! However, that information was unknown to the person entering the data into the table, so nothing was entered.

These notions of data entries that are not applicable or unknown are captured with a special value in SQL: NULL. [1] NULL values represent the absence of information. When we query the authors table, the blanks in our result do not indicate that empty strings are in the database. Instead, NULL values stand in for the missing information. Unfortunately, NULL does not tell us the reason the data is missing - whether it is not applicable or simply unknown. If this distinction is important for your database, you will need to use extra columns to indicate the meaning of the NULL, or use some value other than NULL.

Because NULL is truly an absence of information, NULL values used in expressions usually result in NULL when the expression is evaluated. For example, what is the result of 2 + NULL? We simply cannot know - the NULL is not telling us anything, so the result is unknown, or NULL.

A very important consequence of this behavior is that NULL values cannot be usefully compared with anything, even other NULL values! That is, an expression like x = NULL is never True even if x itself contains NULL. This might seem counterintuitive, but if you think of the expression NULL = NULL as asking the question, “Is this unknown thing the same as this other unknown thing?”, you can see that the answer should be “unknown”, or NULL. [2]

To find out if a value is NULL or not NULL requires special operators: IS NULL and IS NOT NULL. For example, if we want to discover authors for whom we have no death date, we would execute the query:

You can discover authors for whom we do have death dates by replacing IS NULL with IS NOT NULL in the above query.

What happens if we instead write the following query?

SELECT * FROM simple_authors WHERE death = NULL;

In this case, the expression death = NULL will evaluate to NULL for every row in the table. The WHERE clause will filter these out, because it only accepts expressions that evaluate to True, and NULL is not the same as True.

NULL values can sometimes lead us astray. Consider the question of finding all authors who were alive in the year 2000 or later. It might be tempting to write a query such as

SELECT * FROM simple_authors
WHERE birth <= '2000-12-31'
AND death >= '2000-01-01';

This is a perfectly valid query - dates in this standard format can be compared in this fashion in our database. However, if you run the query, you will see that not all of our living authors are in the result. This happened, again, because the death column in those rows contained NULL values: comparing these to '2000-01-01' also yielded NULL, and the WHERE clause therefore filtered them out.

In this case, we need to use more logic, and query the database thus:

SELECT * FROM simple_authors
WHERE birth <= '2000-12-31' AND
    (death >= '2000-01-01' OR death IS NULL);

This works correctly, but you might be wondering why. We said that NULL used in expressions usually results in NULL, but here we have a compound Boolean expression using the operators AND and OR. So why are we not again losing all living authors? Well, it turns out that Boolean operators are an exception. This is because, when used in Boolean expressions, NULL means that we simply cannot know if the value is True or False; the value is unknown. However, the OR expression only requires one operand to evaluate to True in order to return True: True OR True is True, and so is True OR False in Boolean logic. Either way, we get True, so not knowing which it might be doesn’t matter. Therefore the expression in the parentheses is True if either one of the two conditions within it is true.

On the other hand, False OR NULL will give us NULL. In this case, whether the NULL is standing in for True or False actually matters, because each gives a different outcome. Since we do not know the outcome, the result is NULL.

Because Boolean expressions can result in True, False, or NULL, we say that SQL has three-valued logic (not truly Boolean logic). Appendix B - Boolean operators provides truth tables for this three-valued logic, but as shown above, you can usually work out the answer by simply thinking of NULL as meaning “unknown”.

1.3.5.1. Ordering and NULLs

Given that you cannot meaningfully compare NULL with other values, what happens when we ORDER BY a column containing NULL values? Unfortunately, it depends on which database implementation you are working with. You will need to consult your database documentation (or simply try an experiment) to see what its default behavior is. The standard does provide a way to specify whether NULL values should sort to the top or bottom. Compare these two queries:

SELECT * FROM simple_authors ORDER BY death NULLS FIRST;

SELECT * FROM simple_authors ORDER BY death NULLS LAST;

(Note: the NULLS FIRST and NULLS LAST modifiers are not supported in MySQL or SQL Server.)

1.3.6. Conditional expressions

SQL provides expressions for doing simple conditional logic. The basic conditional expression in SQL is the CASE expression, which comes in two forms. In the most general form, CASE lets you specify what the expression should evaluate to depending on a list of conditions. The effect is similar to using if/else or switch/case statements in some programming languages.

The basic form of the CASE expression is

CASE WHEN condition1 THEN result1
     [WHEN condition2 THEN result2]
     ...
     [ELSE result]
END

The CASE keyword comes first, followed by one or more WHEN clauses giving a condition and the desired result if the condition is true. The first true condition determines the result that will be returned. If none of the conditions evaluate to True, then the ELSE result is used, if provided, or NULL if there is no ELSE clause. The expression is finished with the END keyword.

For example, we could put our books into different categories, maybe for different sections in a library, using CASE:

Here we have included tests for some genres not present in our current dataset. A library application might have many categories, each encompassing multiple genres. Using a CASE expression would be one way to output books with their categories, although it depends on knowledge of all the possible genres in our database. A more data-driven way would be to look up categories in another database table using a join, a technique we will discuss in Chapter 1.4.

Another form of CASE matches an expression to possible values. The above query can be rewritten using this form:

SELECT
    author, title,
    CASE genre
        WHEN 'fantasy' THEN 'speculative fiction'
        WHEN 'science fiction' THEN 'speculative fiction'
        WHEN 'poetry' THEN 'poetry'
        WHEN 'history' THEN 'non-fiction'
        ELSE 'general fiction'
    END
    AS category
FROM simple_books;

Additionally, there are two functions that perform specialized conditional logic. The COALESCE function takes a variable number of arguments. The result of the function is the first non-NULL expression in the argument list, or NULL if all arguments are NULL. This can be useful for replacing NULL values with more descriptive values:

SELECT name, COALESCE('died: ' || death, 'living')
FROM simple_authors;

Finally, the NULLIF function takes two arguments: if the arguments are equal, the function results in NULL, otherwise it results in the first argument. This can be used to replace specific values with NULL. For example,

SELECT title, author, NULLIF(genre, 'science fiction')
FROM simple_books;

1.3.7. Self-check exercises

This section contains some exercises using the same books and authors database used in the text above. If you get stuck, click on the “Show answer” button below the exercise to see a correct answer.

Write a query to find all books published from the year 1980 through the year 2000, in order by publication year.

Write a query to find the authors whose name starts with the letter “J”.

Write a query to find books written between 1950 and 1999, excluding poetry.

Write a query to find books written before 1950 or after 1999, excluding science fiction.

Write a query to find books with a title beginning with the letters “T” or “I”, in the fiction, fantasy, or poetry genres.

Write a query to find authors for whom we have no birth date.

Write a query to find deceased authors born after 1915.

Write a query giving book titles and authors together with the century in which they were written, spelled out like ‘Twentieth Century’ (you only need to worry about the 20th - 21st centuries).


Notes

You have attempted of activities on this page