Subsection 7.5.1 Least-squares problems
Least-squares problems, which we explored in
Section 6.5, arise when we are confronted with an inconsistent linear system
Since there is no solution to the system, we instead find the vector
minimizing the distance between
and
That is, we find the vector
the least-squares approximate solution, by solving
where
is the orthogonal projection of
onto the column space of
If we have a singular value decomposition
then the number of nonzero singular values
tells us the rank of
and the first
columns of
form an orthonormal basis for
This basis may be used to project vectors onto
and hence to solve least-squares problems.
Before exploring this connection further, we will introduce Sage as a tool for automating the construction of singular value decompositions. One new feature is that we need to declare our matrix to consist of floating point entries. We do this by including
RDF
inside the matrix definition, as illustrated in the following cell.
Activity 7.5.2.
Consider the equation where
-
Find a singular value decomposition for
using the Sage cell below. What are singular values of
-
What is
the rank of
How can we identify an orthonormal basis for
-
Form the reduced singular value decomposition
by constructing: the matrix
consisting of the first
columns of
the matrix
consisting of the first
columns of
and
a square
diagonal matrix. Verify that
You may find it convenient to remember that if
B
is a matrix defined in Sage, then
B.matrix_from_columns( list )
and
B.matrix_from_rows( list )
can be used to extract columns or rows from
B
. For instance,
B.matrix_from_rows([0,1,2])
provides a matrix formed from the first three rows of
B
.
-
How does the reduced singular value decomposition provide a matrix whose columns are an orthonormal basis for
-
Explain why a least-squares approximate solution satisfies
-
What is the product
and why does it have this form?
-
Explain why
is the least-squares approximate solution, and use this expression to find
This activity demonstrates the power of a singular value decomposition to find a least-squares approximate solution for an equation
Because it immediately provides an orthonormal basis for
something that we’ve had to construct using the Gram-Schmidt process in the past, we can easily project
onto
which results in a simple expression for
Proposition 7.5.1.
If is a reduced singular value decomposition of then a least-squares approximate solution to is given by
If the columns of
are linearly independent, then the equation
has only one solution so there is a unique least-squares approximate solution
Otherwise, the expression in
Proposition 7.5.1 produces the solution to
having the shortest length.
The matrix
is known as the
Moore-Penrose psuedoinverse of
When
is invertible,
Subsection 7.5.2 Rank approximations
If we have a singular value decomposition for a matrix
we can form a sequence of matrices
that approximate
with increasing accuracy. This may feel familiar to calculus students who have seen the way in which a function
can be approximated by a linear function, a quadratic function, and so forth with increasing accuracy.
We’ll begin with a singular value decomposition of a rank matrix so that To create the approximating matrix we keep the first singular values and set the others to zero. For instance, if we can form matrices
and define and Because has nonzero singular values, we know that In fact, there is a sense in which is the closest matrix to among all rank matrices.
Activity 7.5.3.
Let’s consider a matrix where
Evaluating the following cell will create the matrices
U
,
V
, and
Sigma
. Notice how the
diagonal_matrix
command provides a convenient way to form the diagonal matrix
-
-
Now form the approximating matrix
What is
-
Find the error in the approximation
by finding
-
Now find
and the error
What is
-
Find
and the error
What is
-
What would happen if we were to compute
-
What do you notice about the error
as
increases?
In this activity, the approximating matrix
has rank
because its singular value decomposition has
nonzero singular values. We then saw how the difference between
and the approximations
decreases as
increases, which means that the sequence
forms better approximations as
increases.
Another way to represent is with a reduced singular value decomposition so that where
Notice that the rank matrix then has the form and that we can similarly write:
Given two vectors
and
the matrix
is called the
outer product of
and
(The dot product
is sometimes called the
inner product.) An outer product will always be a rank
matrix so we see above how
is obtained by adding together
rank
matrices, each of which gets us one step closer to the original matrix
Subsection 7.5.3 Principal component analysis
In
Section 7.3, we explored principal component analysis as a technique to reduce the dimension of a dataset. In particular, we constructed the covariance matrix
from a demeaned data matrix and saw that the eigenvalues and eigenvectors of
tell us about the variance of the dataset in different directions. We referred to the eigenvectors of
as
principal components and found that projecting the data onto a subspace defined by the first few principal components frequently gave us a way to visualize the dataset. As we added more principal components, we retained more information about the original dataset. This feels similar to the rank
approximations we have just seen so let’s explore the connection.
Suppose that we have a dataset with points, that represents the demeaned data matrix, that is a singular value decomposition, and that the singular values are are denoted as It follows that the covariance matrix
Notice that is a diagonal matrix whose diagonal entries are Therefore, it follows that
is an orthogonal diagonalization of showing that
-
the principal components of the dataset, which are the eigenvectors of
are given by the columns of
In other words, the left singular vectors of
are the principal components of the dataset.
-
the variance in the direction of a principal component is the associated eigenvalue of and therefore
Activity 7.5.4.
Let’s revisit the iris dataset that we studied in
Section 7.3. Remember that there are four measurements given for each of 150 irises and that each iris belongs to one of three species.
Evaluating the following cell will load the dataset and define the demeaned data matrix
whose shape is
-
Find the singular values of
using the command
A.singular_values()
and use them to determine the variance
in the direction of each of the four principal components. What is the fraction of variance retained by the first two principal components?
-
We will now write the matrix so that Suppose that a demeaned data point, say, the 100th column of is written as a linear combination of principal components:
Explain why the vector of coordinates of in the basis of principal components, appears as 100th column of
-
Suppose that we now project this demeaned data point
orthogonally onto the subspace spanned by the first two principal components
and
What are the coordinates of the projected point in this basis and how can we find them in the matrix
-
Alternatively, consider the approximation
of the demeaned data matrix
Explain why the 100th column of
represents the projection of
onto the two-dimensional subspace spanned by the first two principal components,
and
Then explain why the coefficients in that projection,
form the two-dimensional vector
that is the 100th column of
-
Now we’ve seen that the columns of
form the coordinates of the demeaned data points projected on to the two-dimensional subspace spanned by
and
In the cell below, find a singular value decomposition of
and use it to form the matrix
Gamma2
. When you evaluate this cell, you will see a plot of the projected demeaned data plots, similar to the one we created in
Section 7.3.
In our first encounter with principal component analysis, we began with a demeaned data matrix
formed the covariance matrix
and used the eigenvalues and eigenvectors of
to project the demeaned data onto a smaller dimensional subspace. In this section, we have seen that a singular value decomposition of
provides a more direct route: the left singular vectors of
form the principal components and the approximating matrix
represents the data points projected onto the subspace spanned by the first
principal components. The coordinates of a projected demeaned data point are given by the columns of