.. Copyright (C) Google, Runestone Interactive LLC
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/4.0/.
Selecting
=========
Selecting in SQL is how you can choose a portion of data from a database. This
is helpful when you are given a large dataset, and want to focus in on a smaller
portion of the data. Say you are conducting research about bike sharing for an
internship in Washington D.C. You are given the Capital Bike Sharing dataset,
which is hosted on a SQLite database and contains information on Washington
D.C.’s bike share program for 2011.
You may recall that in Sheets, you could select certain sections of data by
specifying a cell range or by simply clicking and dragging. In SQL, you can do
something similar, but rather than specifying a cell range, you'll need to write
some code to accomplish the same behavior. Don't worry if you've never written
code before — SQL code actually looks a lot like English!
SQL **keywords** are generally written using all **upper case**, while **column names
and table names** use **lower case**. You can see this in the following example.
Suppose that you have a table called ``trip_data``. The following SQL query is
an example of how to view the top 10 rows of the table. This query is using SQL
to ask the database to ``SELECT`` all columns (``*`` is a shorthand denoting
“all columns”) ``FROM`` the table named ``trip_data``, but to ``LIMIT`` the
output to the first 10 rows.
.. activecode:: bikeshare_select_top_10
:language: sql
:dburl: /_static/bikeshare.db
SELECT
*
FROM
trip_data
LIMIT
10
Note that SQL does not notice line breaks, so SQL queries are usually spread
across multiple lines for ease of readability. The above query allows you to
read the column names and 10 example rows of data. From this query, you can see
all of the columns in the table which are listed below.
.. image:: figures/bike_dataset_columns.png
:align: center
:scale: 50%
:alt: Columns and descriptions from the bike share dataset.
However, in general, it might not make sense to display all the columns in a
table. This is especially true if your table has a large number of columns. SQL
allows you to select whichever columns you want. For example, if you just want
to see the ``member_type``, ``start_date``, and ``duration`` columns, the query
would look like the following.
.. activecode:: bikeshare_select_columns
:language: sql
:dburl: /runestone/books/published/ac1/_static/bikeshare.db
SELECT
member_type,
start_date,
duration
FROM
trip_data
LIMIT
10
However, maybe you just want to see the number of rows in your dataset rather
than individual data points. Similar to the ``COUNT`` function in Sheets,
you can use ``COUNT(*)``. Recall that ``*`` means all, so this function counts
the total number of rows. This is show in the code block below.
.. activecode:: bikeshare_count_star
:language: sql
:dburl: /runestone/books/published/ac1/_static/bikeshare.db
SELECT
COUNT(*) AS n_rows
FROM
trip_data
Notice that the ``AS`` keyword is used after ``COUNT(*)``. This names the column
that counts the number of rows as ``n_rows``. This is similar to creating a new
column in Sheets and naming it n_rows. In general, whenever you select a column
that is not one of the existing columns of the table, you should use the ``AS``
keyword to name it something informative.
Now that you have learned a bit about selecting in SQL, try out the following
example.
.. activecode:: bikeshare_select_start_and_end_stations
:language: sql
:dburl: /runestone/books/published/ac1/_static/bikeshare.db
Write a query to select the start and end stations for all trips. (Hint:
You will want to use the ``SELECT`` and ``FROM`` keywords).
~~~~
====
assert 0,0 == 31104
assert 0,1 == 31200
assert 1,0 == 31230
assert 1,1 == 31620
assert 99,0 == 31224
assert 99,1 == 31221