Ifs and Cases¶
In SQL, just as in many languages, one of the most basic and useful concepts is
the if/else syntax. Just as in Sheets, you
can use IF
to return different values based on whether a condition is true.
For example, going back to the bikeshare data, suppose you are interested in counting the number of trips that are shorter than one hour and longer than one hour. The duration column contains the duration of the trips in seconds, so you can use this to create a column that tells you whether the duration is over one hour (3600 seconds).
The syntax of the IF
statement is
IF(condition, value if condition is true, value if condition is false)
. This
should look familar, as it is exactly the same as the syntax for the IF
function in Sheets.
Note the use of the TRUE
and FALSE
keywords, which are also similar in
Sheets. You do not need to use these specific keywords, you can use any
indicator you like. For example, you could write
IF(duration > 3600, "over_one_hour", "not_over_one_hour")
or
IF(duration > 3600, 1, 0)
. In this case, the column name
is_over_one_hour
indicates what this field represents, so it is sufficient
to use the inbuilt keywords TRUE
and FALSE
.
SQL also provides an even more versatile set of keywords that allow you to have
as many “ifs” and “elses” as you like. This is done using the CASE
structure. This structure comprises several keywords.
The structure is opened with the
CASE
keyword. This tells SQL to start looking at each statement and evaluating if it is true or not.Each new case is identified with the
WHEN
keyword, followed by a statement. For example,WHEN duration > 3600
.Each case must have a
THEN
keyword after theWHEN
statement. If the statement after theWHEN
keyword is true, then the field evaluates to the value after theTHEN
keyword.If none of the statements after the
WHEN
keyword are met, the structure will look for anELSE
keyword. In this case, the field evaluates to whatever follows theELSE
keyword.The structure is closed with the
END
keyword. This indicates there are no moreWHEN
statements.
This is perhaps best illustrated with an example. The following query counts the
number of trips for different durations. The durations are grouped manually
using the CASE
structure.
The CASE
structure in the field duration_grouped
checks the statements
after each WHEN
keyword. For the first statement that is met, the field
evaluates to whatever follows the corresponding THEN
keyword. For example,
suppose the duration of a trip is 2468 seconds.
The first statement is false. It is not true that
duration < 10 * 60
.The second statement is false. It is not true that
duration < 30 * 60
.The third statement is true, since
duration < 60 * 60
.
As soon as a condition is met, the field is evaluated, so the column
duration_grouped
would be "30-60 minutes"
for this trip.