Section 29.4 Filtering
Filtering data is the process of restricting a query to a subset of rows based on some column condition. Maybe we only want to consider trips that start from a particular location. Or rides that have a certain duration.
To filter data with SQL, we use the
WHERE
keyword. You can append
WHERE column_condition
to any SQL query, and the result will be filtered only to rows that satisfy the column condition. For example, you might want to look only at bike trips that are at least one hour (3600 seconds):
Try running this and examining the duration column… the records that are returned by the
SELECT
should now all be ones with a duration of at least 3600.
When writing
WHERE
clauses, we use the same kinds of comparison operators that we do in Python:
==
,
!=
,
>
,
>=
, … Just like in Python, we write numbers without commas like
3600
; SQL also has some math capabilities, so instead of figuring out for yourself that 60 minutes is 3600 seconds, you could write
duration >= 60 * 60
. Also like in Python, if we want to write something that is a string, we have to put quotes around it (either single
'
or double
"
quotes work). To select the records where
member_type
was
Member we would write:
...
WHERE
member_type == "Member"
Finally, notice that the
WHERE
clause is part of the query as the
;
does not appear until after the
WHERE
. If there was a
;
at the end of the
FROM trip_data
, the statement would end there and the
WHERE
would not be considered to be part of the rest of the query. Try putting a
;
after
trip_data
and running the program. In the modified program, the
SELECT
is still valid and returns the 3 desired columns, but does it for trips of all durations. Then, SQL tries to interpret
WHERE duration >= 3600
on its own, which makes no sense (what does it apply to???) and we get an error about the
WHERE
.
It is also possible to filter by multiple criteria by using
AND
or
OR
. For example, to look at bike trips that are 60 minutes or more and the
member_type
is
'Member'
, the query would be as below.
Checkpoint 29.4.2.
Write a query to find the ending station and duration (in that order) of all trips by bike number
'W00153'
that lasted over 8 hours (remember that duration is measured in seconds, you will have to figure out how many seconds are in 8 hours or have SQL do the math for you).
The bike “number” - ’W00153’ - is not really a number, it is a string, so make sure to use quotes around it in your query.
You do not have to select a column to be in the output in order to use it in a
WHERE
clause. All of the columns are available for filtering, and then only the ones you selected to see are produced in the output. For example, in the exercise above, although you needed to filter by
duration
, it should not appear in the output.
You have attempted
1 of
4 activities on this page.