.. Copyright (C) Google, Runestone Interactive LLC
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/4.0/.
Ifs and Cases
=============
In SQL, just as in many languages, one of the most basic and useful concepts is
the if/else syntax. :ref:`Just as in Sheets`, you
can use ``IF`` to return different values based on whether a condition is true.
For example, going back to the bikeshare data, suppose you are interested in
counting the number of trips that are shorter than one hour and longer than one
hour. The duration column contains the duration of the trips in seconds, so you
can use this to create a column that tells you whether the duration is over one
hour (3600 seconds).
.. activecode:: bikeshare_count_trips_above_below_one_hour
:language: sql
:dburl: /runestone/books/published/ac1/_static/bikeshare.db
SELECT
IF(duration > 3600, TRUE, FALSE) AS is_over_one_hour,
COUNT(*) AS n_trips
FROM
bikeshare_stations
GROUP BY
is_over_one_hour
The syntax of the ``IF`` statement is
``IF(condition, value if condition is true, value if condition is false)``. This
should look familar, as it is exactly the same as the syntax for the ``IF``
function in Sheets.
Note the use of the ``TRUE`` and ``FALSE`` keywords, which are also similar in
Sheets. You do not need to use these specific keywords, you can use any
indicator you like. For example, you could write
``IF(duration > 3600, "over_one_hour", "not_over_one_hour")`` or
``IF(duration > 3600, 1, 0)``. In this case, the column name
``is_over_one_hour`` indicates what this field represents, so it is sufficient
to use the inbuilt keywords ``TRUE`` and ``FALSE``.
.. shortanswer:: bikeshare_name_and_explain_if_query
Give meaningful names to the columns in the following query and explain what
question th query is trying to answer.
.. code-block:: sql
SELECT
IF(duration > 0.5 * 3600, TRUE, FALSE) AS column_1,
SUM(IF(start_station = 31111, 1, 0)) AS column_2
FROM
trip_data
GROUP BY
column_1
.. shortanswer:: bikeshare_if_start_end_same_station
Write a query, using ``IF``, to count the number of trips of member type
*Casual* for trips that start and end at the same station compared to trips
that start and end at different stations. Hint: Your query should have only
2 columns in the ``SELECT`` clause and only one ``WHERE`` clause.
SQL also provides an even more versatile set of keywords that allow you to have
as many “ifs” and “elses” as you like. This is done using the ``CASE``
structure. This structure comprises several keywords.
1. The structure is opened with the ``CASE`` keyword. This tells SQL to start
looking at each statement and evaluating if it is true or not.
2. Each new case is identified with the ``WHEN`` keyword, followed by a
statement. For example, ``WHEN duration > 3600``.
3. Each case must have a ``THEN`` keyword after the ``WHEN`` statement. If the
statement after the ``WHEN`` keyword is true, then the field evaluates to
the value after the ``THEN`` keyword.
4. If none of the statements after the ``WHEN`` keyword are met, the structure
will look for an ``ELSE`` keyword. In this case, the field evaluates to
whatever follows the ``ELSE`` keyword.
5. The structure is closed with the ``END`` keyword. This indicates there are
no more ``WHEN`` statements.
This is perhaps best illustrated with an example. The following query counts the
number of trips for different durations. The durations are grouped manually
using the ``CASE`` structure.
.. activecode:: bikeshare_count_trips_by_duration
:language: sql
:dburl: /runestone/books/published/ac1/_static/bikeshare.db
SELECT
CASE
WHEN duration < 10 * 60 THEN "under 10 minutes"
WHEN duration < 30 * 60 THEN "10-30 minutes"
WHEN duration < 60 * 60 THEN "30-60 minutes"
WHEN duration < 2 * 60 * 60 THEN "1-2 hours"
WHEN duration < 4 * 60 * 60 THEN "2-4 hours"
ELSE "over 4 hours"
END AS duration_grouped,
COUNT(*) AS n_trips
FROM
bikeshare_stations
GROUP BY
duration_grouped
The ``CASE`` structure in the field ``duration_grouped`` checks the statements
after each ``WHEN`` keyword. For the first statement that is met, the field
evaluates to whatever follows the corresponding ``THEN`` keyword. For example,
suppose the duration of a trip is 2468 seconds.
- The first statement is false. It is not true that ``duration < 10 * 60``.
- The second statement is false. It is not true that ``duration < 30 * 60``.
- The third statement is true, since ``duration < 60 * 60``.
As soon as a condition is met, the field is evaluated, so the column
``duration_grouped`` would be ``"30-60 minutes"`` for this trip.
.. shortanswer:: bikeshare_explain_case
Explain what the following query is returning.
.. code-block:: sql
SELECT
CASE
WHEN member_type = 'Casual' AND start_station = end_station THEN "casual_same_station"
WHEN start_station = end_station THEN "member_same_station"
WHEN member_type = 'Casual' THEN "casual_different_station"
ELSE "member_different_station"
END AS ride_classification,
AVG(IF(duration > 3600, 1, 0)) AS proportion_trips_over_one_hour
FROM
trip_data
WHERE
member_type IN ('Casual', 'Member')
GROUP BY
ride_classification