Warning 30.2.1.
All columns that are not in the
GROUP BY
clause must be selected using an aggregating function.COUNT
counts the rows. We can also do COUNT(DISTINCT column_name) to get the number of different values that appear in a column.SUM
adds the values of a numeric columnMIN
calculates the minimum of a numeric columnMAX
calculates the maximum of a numeric columnAVG
calculates the mean of a numeric columnMEDIAN
calculates the median (middle value) of a numeric columnMODE
calculates the mode (most common value) of a numeric columnGROUP BY
statement allows you to group data as you aggregate it. The following query counts the number of trips (COUNT(*)
) for each member type (GROUP BY member_type
).member_type
, and the column named n_trips
counts the number of rows for each type. Recall that AS _______
gives a name to a calculated value - any time we do an aggregation, it will be helpful to name the result so we know what we are seeing in the output. By giving the aggregated value a name, we can use it in the ORDER BY
to sort by the total number of trips we calculated.member_type
and try to select the start_date
. There is no one start date for all of those records. You would instead have to ask about the MIN(start_station)
or MODE(start_station)
or COUNT(DISTINCT start_station)
.GROUP BY
clause must be selected using an aggregating function.start_station
, we SELECT
start_station
. All the other columns we are selecting data from (duration
which is selected from twice) have an aggregation function (MAX
or MIN
in this case).WHERE
before the GROUP BY
. For example, you can calculate the number of trips started from each station for just the bike ’W00174’ with this query: