Projected Gradient Descent for Max and Min Eigenpairs - Proof of Convergence
Let A be a n×nn×n positive semi-definite matrix. Let λ1≥λ2≥⋯≥λnλ1≥λ2≥⋯≥λn be the eigenvalues in decreasing order and u1,u2,…,unu1,u2,…,un be the corresponding orthonormal eigenvectors. We have
A=n∑i=1λiuiu⊤iA=n∑i=1λiuiu⊤iYou need to find the Condition number of AA, which is defined as λ1/λnλ1/λn. You could compute all the eigenvalues and then compute the condition number. But this is superfluous as you just need to compute λ1λ1 and λnλn.
Almost two years ago, I wrote a blog post on using Projected Gradient Descent for finding the maximum and minimum eigenvectors and the corresponding eigenvalues. However, I did not offer any proof for the correctness of the algorithm. I do that in this blog post.
Maximum Eigenpair
Consider the following algorithm:
- x′t+1=xt+ηAxt=(I+ηA)xt
- xt+1=x′t+1‖x′t+1‖2
This corresponds to doing projected gradient descent on the objective min(−x⊤Ax) subject to x⊤x=1.
Similarity to Power Iteration
Power Iteration is possibly the most widely used algorithm for computing the max eigenvector. Its update is:
bt+1=Abt‖Abt‖=At+1b0‖At+1b0‖The PGD update could be written as:
xt+1=(I+ηA)xt‖(I+ηA)xt‖2=(I+ηA)t+1x0‖(I+ηA)t+1x0‖2The PGD update is actually just power iteration on I+ηA. So we could try analyzing it like power iteration.
First note that A and I+ηA have the same eigenvectors when η≥0. The eigenvectors of I+ηA are u1,u2,…,un and the corresponding eigenvalues are 1+ηλ1≥1+ηλ2≥⋯≥1+ηλn. So we know:
- A and I+ηA have the same eigenvectors
- PGD on A is just power iteration on I+ηA
- Power iteration converges to the maximum eigenvector
Hence, PGD will converge to the maximum eigenvector of A.
For a more rigorous proof, consider the following analysis:
If x⊤tuj→0 for j≠1, then xt→u1. We have:
u⊤jxt=u⊤j(I+ηA)tx0‖(I+ηA)tx0‖=u⊤j∑ni=1(1+ηλi)tuiu⊤ix0‖∑ni=1(1+ηλi)tuiu⊤ix0‖=(1+ηλj)tu⊤jx0‖∑ni=1(1+ηλi)tuiu⊤ix0‖=(1+ηλj)tu⊤jx0(∑ni=1(1+ηλi)2t(u⊤ix0)2)1/2≤(1+ηλj1+ηλ1)tuTjx0uT1x0When j≠1, x⊤tuj→0 so xt→u1. Hence, The sequence x⊤tAxt converges to λ1.
Minimum Eigenpair
The minimum eigenvalue of A is the inverse of the largest eigenvalue of A−1. We can run power iteration on A−1 to compute the minimum eigenvalue. But we must compute A−1, which is an overhead.
I proposed PGD on the objective min(x⊤Ax) subject to x⊤x=1. This avoids computing the inverse. The update for this will be:
xt+1=(I−ηA)xt‖(I−ηA)xt‖2=(I−ηA)t+1x0‖(I−ηA)t+1x0‖2Choose η<1/λ1. Note that A and I−ηA have the same eigenvectors. The eigenvalues corresponding to u1,u2,…,un are 1−ηλ1≤1−ηλ2≤⋯≤1−ηλn. So power iteration on I−ηA will converge to un and x⊤tAxt will converge to λn.
To compute the condition number, first find the largest eigenvalue λ1 by using PGD. Then choose η<1/λ1 and compute the smallest eigenvalue λn by using PGD. Compute the ratio λ1/λn.