Skip to main content
Logo image

Section 7.5 Using Singular Value Decompositions

We’ve now seen what singular value decompositions are, how to construct them, and how they provide important information about a matrix such as orthonormal bases for the four fundamental subspaces. This puts us in a good position to begin using singular value decompositions to solve a wide variety of problems.
Given the fact that singular value decompositions so immediately convey fundamental data about a matrix, it seems natural that some of our previous work can be reinterpreted in terms of singular value decompositions. Therefore, we’ll take some time in this section to revisit some familiar issues, such as least-squares problems and principal component analysis, while also looking at some new applications.

Preview Activity 7.5.1.

Suppose that A=UΣVT where
Σ=[130000800002000000000],
vectors uj form the columns of U, and vectors vj form the columns of V.
  1. What are the shapes of the matrices A, U, and V?
  2. What is the rank of A?
  3. Describe how to find an orthonormal basis for Col(A).
  4. Describe how to find an orthonormal basis for Nul(A).
  5. If the columns of Q form an orthonormal basis for Col(A), what is QTQ?
  6. How would you form a matrix that projects vectors orthogonally onto Col(A)?

Subsection 7.5.1 Least-squares problems

Least-squares problems, which we explored in Section 6.5, arise when we are confronted with an inconsistent linear system Ax=b. Since there is no solution to the system, we instead find the vector x minimizing the distance between b and Ax. That is, we find the vector x^, the least-squares approximate solution, by solving Ax^=b^ where b^ is the orthogonal projection of b onto the column space of A.
If we have a singular value decomposition A=UΣVT, then the number of nonzero singular values r tells us the rank of A, and the first r columns of U form an orthonormal basis for Col(A). This basis may be used to project vectors onto Col(A) and hence to solve least-squares problems.
Before exploring this connection further, we will introduce Sage as a tool for automating the construction of singular value decompositions. One new feature is that we need to declare our matrix to consist of floating point entries. We do this by including RDF inside the matrix definition, as illustrated in the following cell.

Activity 7.5.2.

Consider the equation Ax=b where
[101112]x=[136]
  1. Find a singular value decomposition for A using the Sage cell below. What are singular values of A?
  2. What is r, the rank of A? How can we identify an orthonormal basis for Col(A)?
  3. Form the reduced singular value decomposition UrΣrVrT by constructing: the matrix Ur, consisting of the first r columns of U; the matrix Vr, consisting of the first r columns of V; and Σr, a square r×r diagonal matrix. Verify that A=UrΣrVrT.
    You may find it convenient to remember that if B is a matrix defined in Sage, then B.matrix_from_columns( list ) and B.matrix_from_rows( list ) can be used to extract columns or rows from B. For instance, B.matrix_from_rows([0,1,2]) provides a matrix formed from the first three rows of B.
  4. How does the reduced singular value decomposition provide a matrix whose columns are an orthonormal basis for Col(A)?
  5. Explain why a least-squares approximate solution x^ satisfies
    Ax^=UrUrTb.
  6. What is the product VrVrT and why does it have this form?
  7. Explain why
    x^=VrΣr1UrTb
    is the least-squares approximate solution, and use this expression to find x^.
This activity demonstrates the power of a singular value decomposition to find a least-squares approximate solution for an equation Ax=b. Because it immediately provides an orthonormal basis for Col(A), something that we’ve had to construct using the Gram-Schmidt process in the past, we can easily project b onto Col(A), which results in a simple expression for x^.
If the columns of A are linearly independent, then the equation Ax^=b^ has only one solution so there is a unique least-squares approximate solution x^. Otherwise, the expression in Proposition 7.5.1 produces the solution to Ax^=b^ having the shortest length.
The matrix A+=VrΣr1UrT is known as the Moore-Penrose psuedoinverse of A. When A is invertible, A1=A+.

Subsection 7.5.2 Rank k approximations

If we have a singular value decomposition for a matrix A, we can form a sequence of matrices Ak that approximate A with increasing accuracy. This may feel familiar to calculus students who have seen the way in which a function f(x) can be approximated by a linear function, a quadratic function, and so forth with increasing accuracy.
We’ll begin with a singular value decomposition of a rank r matrix A so that A=UΣVT. To create the approximating matrix Ak, we keep the first k singular values and set the others to zero. For instance, if Σ=[2200000140000030000000], we can form matrices
Σ(1)=[220000000000000000000],Σ(2)=[2200000140000000000000]
and define A1=UΣ(1)VT and A2=UΣ(2)VT. Because Ak has k nonzero singular values, we know that rank(Ak)=k. In fact, there is a sense in which Ak is the closest matrix to A among all rank k matrices.

Activity 7.5.3.

Let’s consider a matrix A=UΣVT where
U=[12121212121212121212121212121212],Σ=[500000010000002000004]V=[12121212121212121212121212121212]
Evaluating the following cell will create the matrices U, V, and Sigma. Notice how the diagonal_matrix command provides a convenient way to form the diagonal matrix Σ.
  1. Form the matrix A=UΣVT. What is rank(A)?
  2. Now form the approximating matrix A1=UΣ(1)VT. What is rank(A1)?
  3. Find the error in the approximation AA1 by finding AA1.
  4. Now find A2=UΣ(2)VT and the error AA2. What is rank(A2)?
  5. Find A3=UΣ(3)VT and the error AA3. What is rank(A3)?
  6. What would happen if we were to compute A4?
  7. What do you notice about the error AAk as k increases?
In this activity, the approximating matrix Ak has rank k because its singular value decomposition has k nonzero singular values. We then saw how the difference between A and the approximations Ak decreases as k increases, which means that the sequence Ak forms better approximations as k increases.
Another way to represent Ak is with a reduced singular value decomposition so that Ak=UkΣkVkT where
Uk=[u1uk],Σk=[σ1000σ2000σk],Vk=[v1vk].
Notice that the rank 1 matrix A1 then has the form A1=u1[σ1]v1T=σ1u1v1T and that we can similarly write:
AA1=σ1u1v1TAA2=σ1u1v1T+σ2u2v2TAA3=σ1u1v1T+σ2u2v2T+σ3u3v3TA=Ar=σ1u1v1T+σ2u2v2T+σ3u3v3T++σrurvrT.
Given two vectors u and v, the matrix u vT is called the outer product of u and v. (The dot product uv=uTv is sometimes called the inner product.) An outer product will always be a rank 1 matrix so we see above how Ak is obtained by adding together k rank 1 matrices, each of which gets us one step closer to the original matrix A.

Subsection 7.5.3 Principal component analysis

In Section 7.3, we explored principal component analysis as a technique to reduce the dimension of a dataset. In particular, we constructed the covariance matrix C from a demeaned data matrix and saw that the eigenvalues and eigenvectors of C tell us about the variance of the dataset in different directions. We referred to the eigenvectors of C as principal components and found that projecting the data onto a subspace defined by the first few principal components frequently gave us a way to visualize the dataset. As we added more principal components, we retained more information about the original dataset. This feels similar to the rank k approximations we have just seen so let’s explore the connection.
Suppose that we have a dataset with N points, that A represents the demeaned data matrix, that A=UΣVT is a singular value decomposition, and that the singular values are A are denoted as σi. It follows that the covariance matrix
C=1NAAT=1N(UΣVT)(UΣVT)T=U(1NΣΣT)UT.
Notice that 1NΣΣT is a diagonal matrix whose diagonal entries are 1Nσi2. Therefore, it follows that
C=U(1NΣΣT)UT
is an orthogonal diagonalization of C showing that
  • the principal components of the dataset, which are the eigenvectors of C, are given by the columns of U. In other words, the left singular vectors of A are the principal components of the dataset.
  • the variance in the direction of a principal component is the associated eigenvalue of C and therefore
    Vui=1Nσi2.

Activity 7.5.4.

Let’s revisit the iris dataset that we studied in Section 7.3. Remember that there are four measurements given for each of 150 irises and that each iris belongs to one of three species.
Evaluating the following cell will load the dataset and define the demeaned data matrix A whose shape is 4×150.
  1. Find the singular values of A using the command A.singular_values() and use them to determine the variance Vuj in the direction of each of the four principal components. What is the fraction of variance retained by the first two principal components?
  2. We will now write the matrix Γ=ΣVT so that A=UΓ. Suppose that a demeaned data point, say, the 100th column of A, is written as a linear combination of principal components:
    x=c1u1+c2u2+c3u3+c4u4.
    Explain why [c1c2c3c4], the vector of coordinates of x in the basis of principal components, appears as 100th column of Γ.
  3. Suppose that we now project this demeaned data point x orthogonally onto the subspace spanned by the first two principal components u1 and u2. What are the coordinates of the projected point in this basis and how can we find them in the matrix Γ?
  4. Alternatively, consider the approximation A2=U2Σ2V2T of the demeaned data matrix A. Explain why the 100th column of A2 represents the projection of x onto the two-dimensional subspace spanned by the first two principal components, u1 and u2. Then explain why the coefficients in that projection, c1u1+c2u2, form the two-dimensional vector [c1c2] that is the 100th column of Γ2=Σ2V2T.
  5. Now we’ve seen that the columns of Γ2=Σ2V2T form the coordinates of the demeaned data points projected on to the two-dimensional subspace spanned by u1 and u2. In the cell below, find a singular value decomposition of A and use it to form the matrix Gamma2. When you evaluate this cell, you will see a plot of the projected demeaned data plots, similar to the one we created in Section 7.3.
In our first encounter with principal component analysis, we began with a demeaned data matrix A, formed the covariance matrix C, and used the eigenvalues and eigenvectors of C to project the demeaned data onto a smaller dimensional subspace. In this section, we have seen that a singular value decomposition of A provides a more direct route: the left singular vectors of A form the principal components and the approximating matrix Ak represents the data points projected onto the subspace spanned by the first k principal components. The coordinates of a projected demeaned data point are given by the columns of Γk=ΣkVkT.

Subsection 7.5.4 Image compressing and denoising

In addition to principal component analysis, the approximations Ak of a matrix A obtained from a singular value decomposition can be used in image processing. Remember that we studied the JPEG compression algorithm, whose foundation is the change of basis defined by the Discrete Cosine Transform, in Section 3.3. We will now see how a singular value decomposition provides another tool for both compressing images and removing noise in them.

Activity 7.5.5.

Evaluating the following cell loads some data that we’ll use in this activity. To begin, it defines and displays a 25×15 matrix A.
  1. If we interpret 0 as black and 1 as white, this matrix represents an image as shown below.
    We will explore how the singular value decomposition helps us to compress this image.
    1. By inspecting the image represented by A, identify a basis for Col(A) and determine rank(A).
    2. The following cell plots the singular values of A. Explain how this plot verifies that the rank is what you found in the previous part.
    3. There is a command approximate(A, k) that creates the approximation Ak. Use the cell below to define k and look at the images represented by the first few approximations. What is the smallest value of k for which A=Ak?
    4. Now we can see how the singular value decomposition allows us to compress images. Since this is a 25×15 matrix, we need 2515=375 numbers to represent the image. However, we can also reconstruct the image using a small number of singular values and vectors:
      A=Ak=σ1u1v1T+σ2u2v2T++σkukvkT.
      What are the dimensions of the singular vectors ui and vi? Between the singular vectors and singular values, how many numbers do we need to reconstruct Ak for the smallest k for which A=Ak? This is the compressed size of the image.
    5. The compression ratio is the ratio of the uncompressed size to the compressed size. What compression ratio does this represent?
  2. Next we’ll explore an example based on a photograph.
    1. Consider the following image consisting of an array of 316×310 pixels stored in the matrix A.
      Plot the singular values of A.
    2. Use the cell below to study the approximations Ak for k=1,10,20,50,100.
      Notice how the approximating image Ak more closely approximates the original image A as k increases.
      What is the compression ratio when k=50? What is the compression ratio when k=100? Notice how a higher compression ratio leads to a lower quality reconstruction of the image.
  3. A second, related application of the singular value decomposition to image processing is called denoising. For example, consider the image represented by the matrix A below.
    This image is similar to the image of the letter "O" we first studied in this activity, but there are splotchy regions in the background that result, perhaps, from scanning the image. We think of the splotchy regions as noise, and our goal is to improve the quality of the image by reducing the noise.
    1. Plot the singular values below. How are the singular values of this matrix similar to those represented by the clean image that we considered earlier and how are they different?
    2. There is a natural point where the singular values dramatically decrease so it makes sense to think of the noise as being formed by the small singular values. To denoise the image, we will therefore replace A by its approximation Ak, where k is the point at which the singular values drop off. This has the effect of setting the small singular values to zero and hence eliminating the noise. Choose an appropriate value of k below and notice that the new image appears to be somewhat cleaned up as a result of removing the noise.
Several examples illustrating how the singular value decomposition compresses images are available at this page from Tim Baumann.
 1 
timbaumann.info/projects.html

Subsection 7.5.5 Analyzing Supreme Court cases

As we’ve seen, a singular value decomposition concentrates the most important features of a matrix into the first singular values and singular vectors. We will now use this observation to extract meaning from a large dataset giving the voting records of Supreme Court justices. A similar analysis appears in the paper A pattern analysis of the second Rehnquist U.S. Supreme Court
 2 
gvsu.edu/s/21F
by Lawrence Sirovich.
The makeup of the Supreme Court was unusually stable during a period from 1994-2005 when it was led by Chief Justice William Rehnquist. This is sometimes called the second Rehnquist court. The justices during this period were:
During this time, there were 911 cases in which all nine judges voted. We would like to understand patterns in their voting.

Activity 7.5.6.

Evaluating the following cell loads and displays a dataset describing the votes of each justice in these 911 cases. More specifically, an entry of +1 means that the justice represented by the row voted with the majority in the case represented by the column. An entry of -1 means that justice was in the minority. This information is also stored in the 9×911 matrix A.
The justices are listed, very roughly, in order from more conservative to more progressive.
In this activity, it will be helpful to visualize the entries in various matrices and vectors. The next cell displays the first 50 columns of the matrix A with white representing an entry of +1, red representing -1, and black representing 0.
  1. Plot the singular values of A below. Describe the significance of this plot, including the relative contributions from the singular values σk as k increases.
  2. Form the singular value decomposition A=UΣVT and the matrix of coefficients Γ so that A=UΓ.
  3. We will now study a particular case, the second case which appears as the column of A indexed by 1. There is a command display_column(A, k) that provides a visual display of the kth column of a matrix A. Describe the justices’ votes in the second case.
  4. Also, display the first left singular vector u1, the column of U indexed by 0, and the column of Γ holding the coefficients that express the second case as a linear combination of left singular vectors.
    What does this tell us about how the second case is constructed as a linear combination of left singular vectors? What is the significance of the first left singular vector u1?
  5. Let’s now study the 48th case, which is represented by the column of A indexed by 47. Describe the voting pattern in this case.
  6. Display the second left singular vector u2 and the vector of coefficients that express the 48th case as a linear combination of left singular vectors.
    Describe how this case is constructed as a linear combination of singular vectors. What is the significance of the second left singular vector u2?
  7. The data in Table 7.5.2 describes the number of cases decided by each possible vote count.
    Table 7.5.2. Number of cases by vote count
    Vote count # of cases
    9-0 405
    8-1 89
    7-2 111
    6-3 118
    5-4 188
    How do the singular vectors u1 and u2 reflect this data? Would you characterize the court as leaning toward the conservatives or progressives? Use these singular vectors to explain your response.
  8. Cases decided by a 5-4 vote are often the most impactful as they represent a sharp divide among the justices and, often, society at large. For that reason, we will now focus on the 5-4 decisions. Evaluating the next cell forms the 9×188 matrix B consisting of 5-4 decisions.
    Form the singular value decomposition of B=UΣVT along with the matrix Γ of coefficients so that B=UΓ and display the first left singular vector u1. Study how the 7th case, indexed by 6, is constructed as a linear combination of left singular vectors.
    What does this singular vector tell us about the make up of the court and whether it leans towards the conservatives or progressives?
  9. Display the second left singular vector u2 and study how the 6th case, indexed by 5, is constructed as a linear combination of left singular vectors.
    What does u2 tell us about the relative importance of the justices’ voting records?
  10. By a swing vote, we mean a justice who is less inclined to vote with a particular bloc of justices but instead swings from one bloc to another with the potential to sway close decisions. What do the singular vectors u1 and u2 tell us about the presence of voting blocs on the court and the presence of a swing vote? Which justice represents the swing vote?

Subsection 7.5.6 Summary

This section has demonstrated some uses of the singular value decomposition. Because the singular values appear in decreasing order, the decomposition has the effect of concentrating the most important features of the matrix into the first singular values and singular vectors.
  • Because the first left singular vectors form an orthonormal basis for Col(A), a singular value decomposition provides a convenient way to project vectors onto Col(A) and therefore to solve least-squares problems.
  • A singular value decomposition of a rank r matrix A leads to a series of approximations Ak of A where
    AA1=σ1u1v1TAA2=σ1u1v1T+σ2u2v2TAA3=σ1u1v1T+σ2u2v2T+σ3u3v3TA=Ar=σ1u1v1T+σ2u2v2T+σ3u3v3T++σrurvrT
    In each case, Ak is the rank k matrix that is closest to A.
  • If A is a demeaned data matrix, the left singular vectors give the principal components of A, and the variance in the direction of a principal component can be simply expressed in terms of the corresponding singular value.
  • The singular value decomposition has many applications. In this section, we looked at how the decomposition is used in image processing through the techniques of compression and denoising.
  • Because the first few left singular vectors contain the most important features of a matrix, we can use a singular value decomposition to extract meaning from a large dataset as we did when analyzing the voting patterns of the second Rehnquist court.

Exercises 7.5.7 Exercises

1.

Suppose that
A=[2.11.90.13.71.52.70.90.60.42.81.54.20.42.41.91.8].
  1. Find the singular values of A. What is rank(A)?
  2. Find the sequence of matrices A1, A2, A3, and A4 where Ak is the rank k approximation of A.

2.

Suppose we would like to find the best quadratic function
β0+β1x+β2x2=y
fitting the points
(0,1),(1,0),(2,1.5),(3,4),(4,8).
  1. Set up a linear system Ax=b describing the coefficients x=[β0β1β2].
  2. Find the singular value decomposition of A.
  3. Use the singular value decomposition to find the least-squares approximate solution x^.

3.

Remember that the outer product of two vectors u and v is the matrix u vT.
  1. Suppose that u=[23] and v=[201]. Evaluate the outer product u vT. To get a clearer sense of how this works, perform this operation without using technology.
    How is each of the columns of u vT related to u?
  2. Suppose u and v are general vectors. What is rank(u vT) and what is a basis for its column space Col(u vT)?
  3. Suppose that u is a unit vector. What is the effect of multiplying a vector by the matrix u uT?

4.

Evaluating the following cell loads in a dataset recording some features of 1057 houses. Notice how the lot size varies over a relatively small range compared to the other features. For this reason, in addition to demeaning the data, we’ll scale each feature by dividing by its standard deviation so that the range of values is similar for each feature. The matrix A holds the result.
  1. Find the singular values of A and use them to determine the variance in the direction of the principal components.
  2. For what fraction of the variance do the first two principal components account?
  3. Find a singular value decomposition of A and construct the 2×1057 matrix B whose entries are the coordinates of the demeaned data points projected on to the two-dimensional subspace spanned by the first two principal components. You can plot the projected data points using list_plot(B.columns()).
  4. Study the entries in the first two principal components u1 and u2. Would a more expensive house lie on the left, right, top, or bottom of the plot you constructed?
  5. In what ways does a house that lies on the far left of the plot you constructed differ from an average house? In what ways does a house that lies near the top of the plot you constructed differ from an average house?

5.

Let’s revisit the voting records of justices on the second Rehnquist court. Evaluating the following cell will load the voting records of the justices in the 188 cases decided by a 5-4 vote and store them in the matrix A.
  1. The cell above also defined the 188-dimensional vector v whose entries are all 1. What does the product Av represent? Use the following cell to evaluate this product.
  2. How does the product Av tell us which justice voted in the majority most frequently? What does this say about the presence of a swing vote on the court?
  3. How does this product tell us whether we should characterize this court as leaning conservative or progressive?
  4. How does this product tell us about the presence of a second swing vote on the court?
  5. Study the left singular vector u3 and describe how it reinforces the fact that there was a second swing vote. Who was this second swing vote?

6.

The following cell loads a dataset that describes the percentages with which justices on the second Rehnquist court agreed with one another. For instance, the entry in the first row and second column is 72.78, which means that Justices Rehnquist and Scalia agreed with each other in 72.78% of the cases.
  1. Examine the matrix A. What special structure does this matrix have and why should we expect it to have this structure?
  2. Plot the singular values of A below. For what value of k would the approximation Ak be a reasonable approximation of A?
  3. Find a singular value decomposition A=UΣVT and examine the matrices U and V using, for instance, n(U, 3). What do you notice about the relationship between U and V and why should we expect this relationship to hold?
  4. The command approximate(A, k) will form the approximating matrix Ak. Study the matrix A1 using the display_matrix command. Which justice or justices seem to be most agreeable, that is, most likely to agree with other justices? Which justice is least agreeable?
  5. Examine the difference A2A1 and describe how this tells us about the presence of voting blocs and swing votes on the court.

7.

Suppose that A=UrΣrVrT is a reduced singular value decomposition of the m×n matrix A. The matrix A+=VrΣr1UrT is called the Moore-Penrose inverse of A.
  1. Explain why A+ is an n×m matrix.
  2. If A is an invertible, square matrix, explain why A+=A1.
  3. Explain why AA+b=b^, the orthogonal projection of b onto Col(A).
  4. Explain why A+Ax=x^, the orthogonal projection of x onto Col(AT).

8.

In Subsection 5.1.1, we saw how some linear algebraic computations are sensitive to round off error made by a computer. A singular value decomposition can help us understand when this situation can occur.
For instance, consider the matrices
A=[1.0001111],B=[1111].
The entries in these matrices are quite close to one another, but A is invertible while B is not. It seems like A is almost singular. In fact, we can measure how close a matrix is to being singular by forming the condition number, σ1/σn, the ratio of the largest to smallest singular value. If A were singular, the condition number would be undefined because the singular value σn=0. Therefore, we will think of matrices with large condition numbers as being close to singular.
  1. Define the matrix A and find a singular value decomposition. What is the condition number of A?
  2. Define the left singular vectors u1 and u2. Compare the results A1b when
    1. b=u1+u2.
    2. b=2u1+u2.
    Notice how a small change in the vector b leads to a small change in A1b.
  3. Now compare the results A1b when
    1. b=u1+u2.
    2. b=u1+2u2.
    Notice now how a small change in b leads to a large change in A1b.
  4. Previously, we saw that, if we write x in terms of left singular vectors x=c1v1+c2v2, then we have
    b=Ax=c1σ1u1+c2σ2u2.
    If we write b=d1u1+d2u2, explain why A1b is sensitive to small changes in d2.
Generally speaking, a square matrix A with a large condition number will demonstrate this type of behavior so that the computation of A1 is likely to be affected by round off error. We call such a matrix ill-conditioned.
You have attempted 1 of 1 activities on this page.