.. Copyright (C) Google, Runestone Interactive LLC
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/4.0/.
.. _interpreting_slope:
Interpreting Slope
==================
Now that you have reviewed linear equations, this section outlines how that
relates to data analysis. As mentioned in the previous section, the slope gives
the steepness of a line, but what does this mean?
You can start to answer this question by first understanding what slope is. The
`slope of a straight line `__ tells you
how much the y value changes when the x value is increased by 1. This is also
written as the ratio
.. math::
\frac{rise}{run} = \frac{\text{change in y}}{\text{change in x}}
You've already learned about the correlation coefficient, *r*, which tells you
how strong the relationship between two variables is. Slope tells you something
different. The slope tells you the **magnitude** of that relationship. In other
words, the correlation helps you answer the question “If variable A changes, how
confident are you that variable B will change?”, while the slope helps you
answer the question “If variable A changes, *by how much* will variable B
change?”. Analyzing slope becomes useful in scatter plots when looking at the
line of best fit.
You can practice this through an example. Below is a scatter plot with median
SAT Math score as the explanatory variable, and median earnings after graduation
as the explained variable. A line of best fit has already been added to the
graph. You can follow along with this example by using the
`SAT data spreadsheet `_
which contains data from `College Scorecard. `_
.. image:: figures/SAT_Math_and_Earnings.png
:align: center
:alt: Scatter plot of SAT math median plotted against median earnings after 10 years.
Notice that the data exhibits a general upward trend: in general, schools with
higher median SAT Math scores are more likely to graduate students who earn a
higher median income after 10 years. This is confirmed by the line of best fit,
which has a **positive** slope. To be more precise, the equation of the line of
best fit is :math:`y = 118.95 x - 20347.55`. Or, to use the names of the x and y
variables:
.. math::
Predicted Median Earnings = 118.95 \times SAT Math Median - 20347.55.
The number multiplied by the x-variable is the slope, in this case 118.95. The
constant at the end is the y-intercept, in this case -20347.55. Further breaking
down slope gives you the following:
.. math::
slope = \frac{\text{change in y}}{\text{change in x}} = \frac{\text{change in
median earnings}}{\text{change in median SAT math score}} = \frac{118.95}{1} =
118.95
These values quantify how differences in one variable result in differences in
the predicted values of other variables. From the equation above, because the
slope is 118.95, when the median SAT math score increases by 1, the median
earnings increases by 118.95. In other words, for each point increase in median
SAT math score, the median earnings increase by $118.95. You can use this
information when working through the following example about two different
colleges.
Suppose two schools are pretty similar in most regards, except that College X
has an average SAT math score that is 100 points higher than University Y.
Knowing the slope of the line of best fit in this case is useful to predict how
SAT score impacts other variables. Visually, this relationship looks like the
following:
.. image:: figures/median_sat_earnings_annotated.png
:align: center
:alt: Annotated scatterplot showing the change in y is 11895 while the change in x is 100.
You can incorporate this change into the equation for slope:
.. math::
\frac{\text{change in median earnings}}{\text{change in median SAT math
score}} = \frac{118.95}{1} = \frac{11895}{100}
In words, this means that every 100 point increase in median SAT math score
corresponds with a predicted median earnings increase of $11,895. Another way to
think about this is that if the change in median earnings is $118.95 for a 1
point increase in the median SAT Math score, and the trend is linear, the change
in median earnings is :math:`100 \times $118.95` (or $11,895) for a 100 point
increase in the median SAT Math score. You can see this depicted in the
following chart:
.. image:: figures/Slope_Changes_for_SAT_math_colored.jpg
:align: center
:alt: A chart illustrating the relationship between SAT point increase and median earnings.
If College X had median SAT Math scores 100 points higher than University Y,
their graduates will make on average $11,895 more per year, after 10 years.
Example: SAT Scores and Loans
-----------------------------
This dataset has many other variables, and many other relationships to examine.
For example, does the percentage of students with federal loans at a college
impact its completion rate? Suppose your friend has been accepted to College A
and College B, and is very determined to graduate within four years. However,
she has to take some student loans, and is trying to decide where to go. You
don’t know much about either school, but College A has 10% more students with
federal loans than College B. This scatter plot shows the percentage of students
receiving federal loans as the explanatory variable, and completion rate as the
explained variable.
.. image:: figures/completion_rate_loans.png
:align: center
:alt: Scatter plot depicting the relationship between federal loans and completion rate.
.. mchoice:: completion_rate
Using the scatter plot above, which school will have a lower completion rate?
(Specifically, which school will have fewer students graduating within
6 years?)
- School A
+ Correct: Because the direction of the scatter plot is negative, the
school with the higher percentage of students with federal loans will
have a lower completion rate. So College A will have a lower percentage
of students graduating within 6 years.
- School B
- Incorrect: Try again! Because the direction of the scatter plot is
negative, the school with the higher percentage of students with federal
loans will have a lower completion rate. So, College A will have a lower
percentage of students graduating within 6 years.
The slope of the line of best fit this time is negative. The slope of this line
is negative because the line moves down from left to right, just like the
scatter plot which has a negative direction. This means that in general, the
higher the federal loans taken out to go to a college, the lower the completion
rate.
The equation of the line of best fit is:
.. math::
Predicted Completion Rate = -0.371 \times
\text{Percentage with Federal Loans} + 0.775
You can interpret the slope as you have done before by breaking it into the
change in y over the change in x.
.. math::
slope = \frac{\text{change in y}}{\text{change in x}} = \frac{\text{change in
completion rate}}{\text{change in percentage with federal loans}} = \frac{
-0.371}{1}
Therefore for every 1% increase in the percentage of students with federal
loans, the predicted completion rate drops by 0.37%. College A and B have a
difference of 10% in their federal loans percentage. To determine how much that
impacts the predicted completion rate, you can multiply the slope by 10.
.. math::
slope = \frac{\text{change in y}}{\text{change in x}} = \frac{\text{change in
completion rate}}{\text{change in percentage with federal loans}} = \frac{
-0.371 \times 10}{1 \times 10} = \frac{-3.71}{10}
Another way to think about this is that any change to x has to change y
proportionally. Therefore, if the change in x is multiplied by 10, the change in
y must also be multiplied by 10.
.. image:: figures/Slope_Changes_colored.jpg
:align: center
:alt: A chart illustrating the relationship between loans and completion rate.
So College A and College B should differ in their completion rate by 3.71%. The
negative value indicates that as the x value increases by 10%, the y value
*decreases* by 3.71%.
However, that doesn’t mean that students who have federal loans graduate less
often than students who don’t! One issue is that this dataset is about schools,
not students. There are also many other factors at play. For example, a school’s
financial resources is certainly a lurking variable. Schools where students
don’t need federal loans often have large endowments and give loans or
scholarships directly to their students. These same schools may also have other
resources that contribute to increased graduation rates.
.. mchoice:: slope_line_of_best_fit
Which of the following is the correct interpretation of the slope
of the line of best fit?
:math:`(Predicted Median Debt of Graduates) = 0.209 \times (Average Net
Tuition) + 19043`
- For every dollar that median debt increases by, average net tuition
increases by .209 dollars.
- Incorrect
- For every dollar that average net tuition increases by, median debt
increases by 20.9%.
+ Correct
- For every dollar that median debt increases by, average net tuition
increases by 20.9%.
- Incorrect
- For every dollar that average net tuition increases by, median debt
increases by .209 dollars.
+ Correct