Aggregating Data

Section 30.1 Aggregating Data

Say we want data about how many rides lasted longer than an hour. At this point we don’t care about where they started, or which bike they were on, we just want to know how many such rides there were. To ask for the total number of records a query finds, we can select COUNT(*).

🔗

The AS keyword is used after COUNT(*) to give the data that is returned a name. If we leave it off, the “column” of data will just be labeled COUNT(*). Calling it number_rides makes it much clearer what we are looking at.

🔗

Checkpoint 30.1.1.

How many trips started at station 31111?

🔗

Hint: Build your way to the final query. Start by selecting all the data (*) for all the trips. Then write a WHERE to only select ones that start from 31111. Finally, instead of selecting all of the columns, select just the count of the number of records.

🔗

COUNT is one of the aggregation functions provided by SQL. Aggregation is the process of combining data and COUNT combines all the records and tells us how many there are. But there are other ways we can aggregate data with SQL:

🔗

COUNT counts the rows
🔗

🔗
SUM adds the values of a numeric column
🔗

🔗
MIN calculates the minimum of a numeric column
🔗

🔗
MAX calculates the maximum of a numeric column
🔗

🔗
AVG calculates the mean of a numeric column
🔗

🔗
MEDIAN calculates the median (middle value) of a numeric column
🔗

🔗
MODE calculates the mode (most common value) of a numeric column
🔗

🔗

🔗

For example, the query below calculates the minimum and maximum trip duration:

🔗

Checkpoint 30.1.2.

Write a query to calculate the mean duration (average) of all trips.

🔗

You use WHERE filtering with aggregating functions. The query below calculates the longest trip duration just for trips made by Casual member type users.

🔗

Checkpoint 30.1.3.

Write a query to get the total duration (sum) of all trips taken on bike_number W00174.

🔗

Subsection 30.1.1 Counting Unique Values

It is often helpful to not only count the number of rows but also to count the number of unique values of a column. You can do this using the DISTINCT keyword.

🔗

To count the distinct values of a column, you can simply use COUNT along with DISTINCT. For example, the query below counts how many bike numbers are used.

🔗

Checkpoint 30.1.4.

Write a query to get the number of unique start_station values there are in the table.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next