Skip to main content
Logo image

Section 6.5 Orthogonal least squares

Suppose we collect some data when performing an experiment and plot it as shown on the left of Figure 6.5.1. Notice that there is no line on which all the points lie; in fact, it would be surprising if there were since we can expect some uncertainty in the measurements recorded. There does, however, appear to be a line, as shown on the right, on which the points almost lie.
Figure 6.5.1. A collection of points and a line approximating the linear relationship implied by them.
In this section, weโ€™ll explore how the techniques developed in this chapter enable us to find the line that best approximates the data. More specifically, weโ€™ll see how the search for a line passing through the data points leads to an inconsistent system Ax=b. Since we are unable to find a solution, we instead seek the vector x where Ax is as close as possible to b. Orthogonal projection gives us just the right tool for doing this.

Preview Activity 6.5.1.

  1. Is there a solution to the equation Ax=b where A and b are such that
    [1225โˆ’10]x=[5โˆ’3โˆ’1].
  2. We know that [12โˆ’1] and [250] form a basis for Col(A). Find an orthogonal basis for Col(A).
  3. Find the orthogonal projection b^ of b onto Col(A).
  4. Explain why the equation Ax=b^ must be consistent and then find its solution.

Subsection 6.5.1 A first example

When weโ€™ve encountered inconsistent systems in the past, weโ€™ve simply said there is no solution and moved on. The preview activity, however, shows how we can find approximate solutions to an inconsistent system: if there are no solutions to Ax=b, we instead solve the consistent system Ax=b^, the orthogonal projection of b onto Col(A). As weโ€™ll see, this solution is, in a specific sense, the best possible.

Activity 6.5.2.

Suppose we have three data points (1,1), (2,1), and (3,3) and that we would like to find a line passing through them.
  1. Plot these three points in Figure 6.5.2. Are you able to draw a line that passes through all three points?
    Figure 6.5.2. Plot the three data points here.
  2. Remember that the equation of a line can be written as b+mx=y where m is the slope and b is the y-intercept. We will try to find b and m so that the three points lie on the line.
    The first data point (1,1) gives an equation for b and m. In particular, we know that when x=1, then y=1 so we have b+m(1)=1 or b+m=1. Use the other two data points to create a linear system describing m and b.
  3. We have obtained a linear system having three equations, one from each data point, for the two unknowns b and m. Identify a matrix A and vector b so that the system has the form Ax=b, where x=[bm].
    Notice that the unknown vector x=[bm] describes the line that we seek.
  4. Is there a solution to this linear system? How does this question relate to your attempt to draw a line through the three points above?
  5. Since this system is inconsistent, we know that b is not in the column space Col(A). Find an orthogonal basis for Col(A) and use it to find the orthogonal projection b^ of b onto Col(A).
  6. Since b^ is in Col(A), the equation Ax=b^ is consistent. Find its solution x=[bm] and sketch the line y=b+mx in Figure 6.5.2. We say that this is the line of best fit.
This activity illustrates the idea behind a technique known as orthogonal least squares, which we have been working toward throughout this chapter. If the data points are denoted as (xi,yi), we construct the matrix A and vector b as
A=[1x11x21x3],b=[y1y2y3].
With the vector x=[bm] representing the line b+mx=y, we see that the equation Ax=b describes a line passing through all the data points. In our activity, it is visually apparent that there is no such line, which agrees with the fact that the equation Ax=b is inconsistent.
Remember that b^, the orthogonal projection of b onto Col(A), is the closest vector in Col(A) to b. Therefore, when we solve the equation Ax=b^, we are finding the vector x so that Ax=[b+mx1b+mx2b+mx3] is as close to b=[y1y2y3] as possible. Letโ€™s think about what this means within the context of this problem.
The difference bโˆ’Ax=[y1โˆ’(b+mx1)y2โˆ’(b+mx2)y3โˆ’(b+mx3)] so that the square of the distance between Ax and b is
|bโˆ’Ax|2=(y1โˆ’(b+mx1))2+(y2โˆ’(b+mx2))2+(y3โˆ’(b+mx3))2.
Our approach finds the values for b and m that make this sum of squares as small as possible, which is why we call this a least-squares problem.
Drawing the line defined by the vector x=[bm], the quantity yiโˆ’(b+mxi) reflects the vertical distance between the line and the data point (xi,yi), as shown in Figure 6.5.5. Seen in this way, the square of the distance |bโˆ’Ax|2 is a measure of how much the line defined by the vector x misses the data points. The solution to the least-squares problem is the line that misses the data points by the smallest amount possible.
Figure 6.5.5. The solution of the least-squares problem and the vertical distances between the line and the data points.

Subsection 6.5.2 Solving least-squares problems

Now that weโ€™ve seen an example of what weโ€™re trying to accomplish, letโ€™s put this technique into a more general framework.
Given an inconsistent system Ax=b, we seek the vector x that minimizes the distance from Ax to b. In other words, x satisfies Ax=b^, where b^ is the orthogonal projection of b onto the column space Col(A). We know the equation Ax=b^ is consistent since b^ is in Col(A), and we know there is only one solution if we assume that the columns of A are linearly independent.
We will usually denote the solution of Ax=b^ by x^ and call this vector the least-squares approximate solution of Ax=b to distinguish it from a (possibly non-existent) solution of Ax=b.
There is an alternative method for finding x^ that does not involve first finding the orthogonal projection b^. Remember that b^ is defined by the fact that b^โˆ’b is orthogonal to Col(A). In other words, b^โˆ’b is in the orthogonal complement Col(A)โŠฅ, which Proposition 6.2.10 tells us is the same as Nul(AT). Since b^โˆ’b is in Nul(AT), it follows that
AT(b^โˆ’b)=0.
Because the least-squares approximate solution is the vector x^ such that Ax^=b^, we can rearrange this equation to see that
AT(Ax^โˆ’b)=0ATAx^โˆ’ATb=0ATAx^=ATb.
This equation is called the normal equation, and we have the following proposition.

Example 6.5.7.

Consider the equation
[2120โˆ’13]x=[16โˆ’17]
with matrix A and vector b. Since this equation is inconsistent, we will find the least-squares approximate solution x^ by solving the normal equation ATAx^=ATb, which has the form
ATAx^=[9โˆ’1โˆ’110]x^=[2337]=ATb
and the solution x^=[34].

Activity 6.5.3.

The rate at which a cricket chirps is related to the outdoor temperature, as reflected in some experimental data that weโ€™ll study in this activity. The chirp rate C is expressed in chirps per second while the temperature T is in degrees Fahrenheit. Evaluate the following cell to load the data:
Evaluating this cell also provides:
  • the vectors chirps and temps formed from the columns of the dataset.
  • the command onesvec(n), which creates an n-dimensional vector whose entries are all one.
  • Remember that you can form a matrix whose columns are the vectors v1 and v2 with matrix([v1, v2]).T.
We would like to represent this relationship by a linear function
ฮฒ0+ฮฒ1C=T.
  1. Use the first data point (C1,T1)=(20.0,88.6) to write an equation involving ฮฒ0 and ฮฒ1.
  2. Suppose that we represent the unknowns using a vector x=[ฮฒ0ฮฒ1]. Use the 15 data points to create the matrix A and vector b so that the linear system Ax=b describes the unknown vector x.
  3. Write the normal equations ATAx^=ATb; that is, find the matrix ATA and the vector ATb.
  4. Solve the normal equations to find x^, the least-squares approximate solution to the equation Ax=b. Call your solution xhat since x has another meaning in Sage.
    What are the values of ฮฒ0 and ฮฒ1 that you found?
  5. If the chirp rate is 22 chirps per second, what is your prediction for the temperature?
    You can plot the data and your line, assuming you called the solution xhat, using the cell below.
This example demonstrates an approach, called linear regression, in which a collection of data is modeled using a linear function found by solving a least-squares problem. Once we have the linear function that best fits the data, we can make predictions about situations that we havenโ€™t encountered in the data.
If weโ€™re going to use our function to make predictions, itโ€™s natural to ask how much confidence we have in these predictions. This is a statistical question that leads to a rich and well-developed theory
โ€‰1โ€‰
For example, see Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning: with Applications in R. Springer, 2013.
, which we wonโ€™t explore in much detail here. However, there is one simple measure of how well our linear function fits the data that is known as the coefficient of determination and denoted by R2.
We have seen that the square of the distance |bโˆ’Ax|2 measures the amount by which the line fails to pass through the data points. When the line is close to the data points, we expect this number to be small. However, the size of this measure depends on the scale of the data. For instance, the two lines shown in Figure 6.5.8 seem to fit the data equally well, but |bโˆ’Ax^|2 is 100 times larger on the right.
Figure 6.5.8. The lines appear to fit equally well in spite of the fact that |bโˆ’Ax^|2 differs by a factor of 100.
The coefficient of determination R2 is defined by normalizing |bโˆ’Ax^|2 so that it is independent of the scale. Recall that we described how to demean a vector in Section 6.1: given a vector v, we obtain v~ by subtracting the average of the components from each component.

Definition 6.5.9. Coefficient of determination.

The coefficient of determination is
R2=1โˆ’|bโˆ’Ax^|2|b~|2,
where b~ is the vector obtained by demeaning b.
A more complete explanation of this definition relies on the concept of variance, which we explore in Exercise 6.5.6.12 and the next chapter. For the time being, itโ€™s enough to know that 0โ‰คR2โ‰ค1 and that the closer R2 is to 1, the better the line fits the data. In our original example, illustrated in Figure 6.5.8, we find that R2=0.75, and in our study of cricket chirp rates, we have R2=0.69. However, assessing the confidence we have in predictions made by solving a least-squares problem can require considerable thought, and it would be naive to rely only on the value of R2.

Subsection 6.5.3 Using QR factorizations

As weโ€™ve seen, the least-squares approximate solution x^ to Ax=b may be found by solving the normal equation ATAx^=ATb, and this can be a practical strategy for some problems. However, this approach can be problematic as small rounding errors can accumulate and lead to inaccurate final results.
As the next activity demonstrates, there is an alternate method for finding the least-squares approximate solution x^ using a QR factorization of the matrix A, and this method is preferable as it is numerically more reliable.

Activity 6.5.4.

  1. Suppose we are interested in finding the least-squares approximate solution to the equation Ax=b and that we have the QR factorization A=QR. Explain why the least-squares approximate solution is given by solving
    Ax^=QQTbQRx^=QQTb
  2. Multiply both sides of the second expression by QT and explain why
    Rx^=QTb.
    Since R is upper triangular, this is a relatively simple equation to solve using back substitution, as we saw in Section 5.1. We will therefore write the least-squares approximate solution as
    x^=Rโˆ’1QTb,
    and put this to use in the following context.
  3. Brozakโ€™s formula, which is used to calculate a personโ€™s body fat index BFI, is
    BFI=100(4.57ฯโˆ’4.142)
    where ฯ denotes a personโ€™s body density in grams per cubic centimeter. Obtaining an accurate measure of ฯ is difficult, however, because it requires submerging the person in water and measuring the volume of water displaced. Instead, we will gather several other body measurements, which are more easily obtained, and use it to predict BFI.
    For instance, suppose we take 10 patients and measure their weight w in pounds, height h in inches, abdomen a in centimeters, wrist circumference r in centimeters, neck circumference n in centimeters, and BFI. Evaluating the following cell loads and displays the data.
    In addition, that cell provides:
    1. vectors weight, height, abdomen, wrist, neck, and BFI formed from the columns of the dataset.
    2. the command onesvec(n), which returns an n-dimensional vector whose entries are all one.
    3. the command QR(A) that returns the QR factorization of A as Q, R = QR(A).
    4. the command demean(v), which returns the demeaned vector v~.
    We would like to find the linear function
    ฮฒ0+ฮฒ1w+ฮฒ2h+ฮฒ3a+ฮฒ4r+ฮฒ5n=BFI
    that best fits the data.
    Use the first data point to write an equation for the parameters ฮฒ0,ฮฒ1,โ€ฆ,ฮฒ5.
  4. Describe the linear system Ax=b for these parameters. More specifically, describe how the matrix A and the vector b are formed.
  5. Construct the matrix A and find its QR factorization in the cell below.
  6. Find the least-squares approximate solution x^ by solving the equation Rx^=QTb. You may want to use N(xhat) to display a decimal approximation of the vector. What are the parameters ฮฒ0,ฮฒ1,โ€ฆ,ฮฒ5 that best fit the data?
  7. Find the coefficient of determination R2 for your parameters. What does this imply about the quality of the fit?
  8. Suppose a personโ€™s measurements are: weight 190, height 70, abdomen 90, wrist 18, and neck 35. Estimate this personโ€™s BFI.
To summarize, we have seen that

Subsection 6.5.4 Polynomial Regression

In the examples weโ€™ve seen so far, we have fit a linear function to a dataset. Sometimes, however, a polynomial, such as a quadratic function, may be more appropriate. It turns out that the techniques weโ€™ve developed in this section are still useful as the next activity demonstrates.

Activity 6.5.5.

  1. Suppose that we have a small dataset containing the points (0,2), (1,1), (2,3), and (3,3), such as appear when the following cell is evaluated.
    In addition to loading and plotting the data, evaluating that cell provides the following commands:
    • Q, R = QR(A) returns the QR factorization of A.
    • demean(v) returns the demeaned vector v~.
    Letโ€™s fit a quadratic function of the form
    ฮฒ0+ฮฒ1x+ฮฒ2x2=y
    to this dataset.
    Write four equations, one for each data point, that describe the coefficients ฮฒ0, ฮฒ1, and ฮฒ2.
  2. Express these four equations as a linear system Ax=b where x=[ฮฒ0ฮฒ1ฮฒ2].
    Find the QR factorization of A and use it to find the least-squares approximate solution x^.
  3. Use the parameters ฮฒ0, ฮฒ1, and ฮฒ2 that you found to write the quadratic function that fits the data. You can plot this function, along with the data, by entering your function in the place indicated below.
  4. What is your predicted y value when x=1.5?
  5. Find the coefficient of determination R2 for the quadratic function. What does this say about the quality of the fit?
  6. Now fit a cubic polynomial of the form
    ฮฒ0+ฮฒ1x+ฮฒ2x2+ฮฒ3x3=y
    to this dataset.
  7. Find the coefficient of determination R2 for the cubic function. What does this say about the quality of the fit?
  8. What do you notice when you plot the cubic function along with the data? How does this reflect the value of R2 that you found?
The matrices A that you created in the last activity when fitting a quadratic and cubic function to a dataset have a special form. In particular, if the data points are labeled (xi,yi) and we seek a degree k polynomial, then
A=[1x1x12โ€ฆx1k1x2x22โ€ฆx2kโ‹ฎโ‹ฎโ‹ฎโ‹ฑโ‹ฎ1xmxm2โ€ฆxmk].
This is called a Vandermonde matrix of degree k.

Activity 6.5.6.

This activity explores a dataset describing Arctic sea ice and that comes from Sustainability Math.
โ€‰2โ€‰
sustainabilitymath.org
Evaluating the cell below will plot the extent of Arctic sea ice, in millions of square kilometers, during the twelve months of 2012.
In addition, you have access to a few special variables and commands:
  • month is the vector of month values and ice is the vector of sea ice values from the table above.
  • vandermonde(x, k) constructs the Vandermonde matrix of degree k using the points in the vector x.
  • Q, R = QR(A) provides the QR factorization of A.
  • demean(v) returns the demeaned vector v~.
  1. Find the vector x^, the least-squares approximate solution to the linear system that results from fitting a degree 5 polynomial to the data.
  2. If your result is stored in the variable xhat, you may plot the polynomial and the data together using the following cell.
  3. Find the coefficient of determination R2 for this polynomial fit.
  4. Repeat these steps to fit a degree 8 polynomial to the data, plot the polynomial with the data, and find R2.
  5. Repeat one more time by fitting a degree 11 polynomial to the data, creating a plot, and finding R2.
    Itโ€™s certainly true that higher degree polynomials fit the data better, as seen by the increasing values of R2, but thatโ€™s not always a good thing. For instance, when k=11, you may notice that the graph of the polynomial wiggles a little more than we would expect. In this case, the polynomial is trying too hard to fit the data, which usually contains some uncertainty, especially if itโ€™s obtained from measurements. The error built in to the data is called noise, and its presence means that we shouldnโ€™t expect our polynomial to fit the data perfectly. When we choose a polynomial whose degree is too high, we give the noise too much weight in the model, which leads to some undesirable behavior, like the wiggles in the graph.
    Fitting the data with a polynomial whose degree is too high is called overfitting, a phenomenon that can appear in many machine learning applications. Generally speaking, we would like to choose k large enough to capture the essential features of the data but not so large that we overfit and build the noise into the model. There are ways to determine the optimal value of k, but we wonโ€™t pursue that here.
  6. Choosing a reasonable value of k, estimate the extent of Arctic sea ice at month 6.5, roughly at the Summer Solstice.

Subsection 6.5.5 Summary

This section introduced some types of least-squares problems and a framework for working with them.
  • Given an inconsistent system Ax=b, we find x^, the least-squares approximate solution, by requiring that Ax^ be as close to b as possible. In other words, Ax^=b^ where b^ is the orthogonal projection of b onto Col(A).
  • One way to find x^ is by solving the normal equations ATAx^=ATb. This is not our preferred method since numerical problems can arise.
  • A second way to find x^ uses a QR factorization of A. If A=QR, then x^=Rโˆ’1QTb and finding Rโˆ’1 is computationally feasible since R is upper triangular.
  • This technique may be applied widely and is useful for modeling data. We saw examples in this section where linear functions of several input variables and polynomials provided effective models for different datasets.
  • A simple measure of the quality of the fit is the coefficient of determination R2 though some additional thought should be given in real applications.

Exercises 6.5.6 Exercises

Evaluating the following cell loads in some commands that will be helpful in the following exercises. In particular, there are commands:
  • QR(A) that returns the QR factorization of A as Q, R = QR(A),
  • onesvec(n) that returns the n-dimensional vector whose entries are all 1,
  • demean(v) that demeans the vector v,
  • vandermonde(x, k) that returns the Vandermonde matrix of degree k formed from the components of the vector x, and
  • plot_model(xhat, data) that plots the data and the model xhat.

1.

Suppose we write the linear system
[1โˆ’12โˆ’1โˆ’13]x=[โˆ’85โˆ’10]
as Ax=b.
  1. Find an orthogonal basis for Col(A).
  2. Find b^, the orthogonal projection of b onto Col(A).
  3. Find a solution to the linear system Ax=b^.

2.

Consider the data in Table 6.5.11.
Table 6.5.11. A dataset with four points.
x y
1 1
2 1
3 1
4 2
  1. Set up the linear system Ax=b that describes the line b+mx=y passing through these points.
  2. Write the normal equations that describe the least-squares approximate solution to Ax=b.
  3. Find the least-squares approximate solution x^ and plot the data and the resulting line.
  4. What is your predicted y-value when x=3.5?
  5. Find the coefficient of determination R2.

3.

Consider the four points in Table 6.5.11.
  1. Set up a linear system Ax=b that describes a quadratic function
    ฮฒ0+ฮฒ1x+ฮฒ2x2=y
    passing through the points.
  2. Use a QR factorization to find the least-squares approximate solution x^ and plot the data and the graph of the resulting quadratic function.
  3. What is your predicted y-value when x=3.5?
  4. Find the coefficient of determination R2.

4.

Consider the data in Table 6.5.12.
Table 6.5.12. A simple dataset
x1 x2 y
1 1 4.2
1 2 3.3
2 1 5.9
2 2 5.1
3 2 7.5
3 3 6.3
  1. Set up a linear system Ax=b that describes the relationship
    ฮฒ0+ฮฒ1x1+ฮฒ2x2=y.
  2. Find the least-squares approximate solution x^.
  3. What is your predicted y-value when x1=2.4 and x2=2.9?
  4. Find the coefficient of determination R2.

5.

Determine whether the following statements are true or false and explain your thinking.
  1. If Ax=b is consistent, then x^ is a solution to Ax=b.
  2. If R2=1, then the least-squares approximate solution x^ is also a solution to the original equation Ax=b.
  3. Given the QR factorization A=QR, we have Ax^=QTQb.
  4. A QR factorization provides a method for finding the least-squares approximate solution to Ax=b that is more reliable than solving the normal equations.
  5. A solution to AATx=Ab is the least-squares approximate solution to Ax=b.

6.

Explain your response to the following questions.
  1. If x^=0, what does this say about the vector b?
  2. If the columns of A are orthonormal, how can you easily find the least-squares approximate solution to Ax=b?

7.

The following cell loads in some data showing the number of people in Bangladesh living without electricity over 27 years. It also defines vectors year, which records the years in the dataset, and people, which records the number of people.
  1. Suppose we want to write
    N=ฮฒ0+ฮฒ1t
    where t is the year and N is the number of people. Construct the matrix A and vector b so that the linear system Ax=b describes the vector x=[ฮฒ0ฮฒ1].
  2. Using a QR factorization of A, find the values of ฮฒ0 and ฮฒ1 in the least-squares approximate solution x^.
  3. What is the coefficient of determination R2 and what does this tell us about the quality of the approximation?
  4. What is your prediction for the number of people living without electricity in 1985?
  5. Estimate the year in which there will be no people living without electricity.

8.

This problem concerns a dataset describing planets in our Solar system. For each planet, we have the length L of the semi-major axis, essentially the distance from the planet to the Sun in AU (astronomical units), and the period P, the length of time in years required to complete one orbit around the Sun.
We would like to model this data using the function P=CLr where C and r are parameters we need to determine. Since this isnโ€™t a linear function, we will transform this relationship by taking the natural logarithm of both sides to obtain
lnโก(P)=lnโก(C)+rlnโก(L).
Evaluating the following cell loads the dataset and defines two vectors logaxis, whose components are lnโก(L), and logperiod, whose components are lnโก(P).
  1. Construct the matrix A and vector b so that the solution to Ax=b is the vector x=[lnโก(C)r].
  2. Find the least-squares approximate solution x^. What does this give for the values of C and r?
  3. Find the coefficient of determination R2. What does this tell us about the quality of the approximation?
  4. Suppose that the orbit of an asteroid has a semi-major axis whose length is L=4.0 AU. Estimate the period P of the asteroidโ€™s orbit.
  5. Halleyโ€™s Comet has a period of P=75 years. Estimate the length of its semi-major axis.

9.

Evaluating the following cell loads a dataset describing the temperature in the Earthโ€™s atmosphere at various altitudes. There are also two vectors altitude, expressed in kilometers, and temperature, in degrees Celsius.
  1. Describe how to form the matrix A and vector b so that the linear system Ax=b describes a degree k polynomial fitting the data.
  2. After choosing a value of k, construct the matrix A and vector b, and find the least-squares approximate solution x^.
  3. Plot the polynomial and data using plot_model(xhat, data).
  4. Now examine what happens as you vary the degree of the polynomial k. Choose an appropriate value of k that seems to capture the most important features of the data while avoiding overfitting, and explain your choice.
  5. Use your value of k to estimate the temperature at an altitude of 55 kilometers.

10.

The following cell loads some data describing 1057 houses in a particular real estate market. For each house, we record the living area in square feet, the lot size in acres, the age in years, and the price in dollars. The cell also defines variables area, size, age, and price.
We will use linear regression to predict the price of a house given its living area, lot size, and age:
ฮฒ0+ฮฒ1 Living Area+ฮฒ2 Lot Size+ฮฒ3 Age=Price.
  1. Use a QR factorization to find the least-squares approximate solution x^.
  2. Discuss the significance of the signs of ฮฒ1, ฮฒ2, and ฮฒ3.
  3. If two houses are identical except for differing in age by one year, how would you predict that their prices compare to each another?
  4. Find the coefficient of determination R2. What does this say about the quality of the fit?
  5. Predict the price of a house whose living area is 2000 square feet, lot size is 1.5 acres, and age is 50 years.

11.

We observed that if the columns of A are linearly independent, then there is a unique least-squares approximate solution to the equation Ax=b because the equation Ax^=b^ has a unique solution. We also said that x^ is the unique solution to the normal equation ATAx^=ATb without explaining why this equation has a unique solution. This exercise offers an explanation.
Assuming that the columns of A are linearly independent, we would like to conclude that the equation ATAx^=ATb has a unique solution.
  1. Suppose that x is a vector for which ATAx=0. Explain why the following argument is valid and allows us to conclude that Ax=0.
    ATAx=0xโ‹…ATAx=xโ‹…0=0(Ax)โ‹…(Ax)=0|Ax|2=0.
    In other words, if ATAx=0, we know that Ax=0.
  2. If the columns of A are linearly independent and Ax=0, what do we know about the vector x?
  3. Explain why ATAx=0 can only happen when x=0.
  4. Assuming that the columns of A are linearly independent, explain why ATAx^=ATb has a unique solution.

12.

This problem is about the meaning of the coefficient of determination R2 and its connection to variance, a topic that appears in the next section. Throughout this problem, we consider the linear system Ax=b and the approximate least-squares solution x^, where Ax^=b^. We suppose that A is an mร—n matrix, and we will denote the m-dimensional vector 1=[11โ‹ฎ1].
  1. Explain why bโ€•, the mean of the components of b, can be found as the dot product
    bโ€•=1mbโ‹…1.
  2. In the examples we have seen in this section, explain why 1 is in Col(A).
  3. If we write b=b^+bโŠฅ, explain why
    bโŠฅโ‹…1=0
    and hence why the mean of the components of bโŠฅ is zero.
  4. The variance of an m-dimensional vector v is Var(v)=1m|v~|2, where v~ is the vector obtained by demeaning v.
    Explain why
    Var(b)=Var(b^)+Var(bโŠฅ).
  5. Explain why
    |bโˆ’Ax^|2|b~|2=Var(bโŠฅ)Var(b)
    and hence
    R2=Var(b^)Var(b)=Var(Ax^)Var(b).
    These expressions indicate why it is sometimes said that R2 measures the โ€œfraction of variance explainedโ€ by the function we are using to fit the data. As seen in the previous exercise, there may be other features that are not recorded in the dataset that influence the quantity we wish to predict.
  6. Explain why 0โ‰คR2โ‰ค1.
You have attempted 1 of 1 activities on this page.