Ifs and Cases¶
In SQL, just as in many languages, one of the most basic and useful concepts is
the if/else syntax. Just as in Sheets, you
can use IF to return different values based on whether a condition is true.
For example, going back to the bikeshare data, suppose you are interested in counting the number of trips that are shorter than one hour and longer than one hour. The duration column contains the duration of the trips in seconds, so you can use this to create a column that tells you whether the duration is over one hour (3600 seconds).
The syntax of the IF statement is
IF(condition, value if condition is true, value if condition is false). This
should look familar, as it is exactly the same as the syntax for the IF
function in Sheets.
Note the use of the TRUE and FALSE keywords, which are also similar in
Sheets. You do not need to use these specific keywords, you can use any
indicator you like. For example, you could write
IF(duration > 3600, "over_one_hour", "not_over_one_hour") or
IF(duration > 3600, 1, 0). In this case, the column name
is_over_one_hour indicates what this field represents, so it is sufficient
to use the inbuilt keywords TRUE and FALSE.
SQL also provides an even more versatile set of keywords that allow you to have
as many “ifs” and “elses” as you like. This is done using the CASE
structure. This structure comprises several keywords.
The structure is opened with the
CASEkeyword. This tells SQL to start looking at each statement and evaluating if it is true or not.Each new case is identified with the
WHENkeyword, followed by a statement. For example,WHEN duration > 3600.Each case must have a
THENkeyword after theWHENstatement. If the statement after theWHENkeyword is true, then the field evaluates to the value after theTHENkeyword.If none of the statements after the
WHENkeyword are met, the structure will look for anELSEkeyword. In this case, the field evaluates to whatever follows theELSEkeyword.The structure is closed with the
ENDkeyword. This indicates there are no moreWHENstatements.
This is perhaps best illustrated with an example. The following query counts the
number of trips for different durations. The durations are grouped manually
using the CASE structure.
The CASE structure in the field duration_grouped checks the statements
after each WHEN keyword. For the first statement that is met, the field
evaluates to whatever follows the corresponding THEN keyword. For example,
suppose the duration of a trip is 2468 seconds.
The first statement is false. It is not true that
duration < 10 * 60.The second statement is false. It is not true that
duration < 30 * 60.The third statement is true, since
duration < 60 * 60.
As soon as a condition is met, the field is evaluated, so the column
duration_grouped would be "30-60 minutes" for this trip.