Projected Gradient Descent for Max and Min Eigenpairs - Proof of Convergence

Let A be a n×nn×n positive semi-definite matrix. Let λ1λ2λnλ1λ2λn be the eigenvalues in decreasing order and u1,u2,,unu1,u2,,un be the corresponding orthonormal eigenvectors. We have

A=ni=1λiuiuiA=ni=1λiuiui

You need to find the Condition number of AA, which is defined as λ1/λnλ1/λn. You could compute all the eigenvalues and then compute the condition number. But this is superfluous as you just need to compute λ1λ1 and λnλn.

Almost two years ago, I wrote a blog post on using Projected Gradient Descent for finding the maximum and minimum eigenvectors and the corresponding eigenvalues. However, I did not offer any proof for the correctness of the algorithm. I do that in this blog post.

Maximum Eigenpair

Consider the following algorithm:

  1. xt+1=xt+ηAxt=(I+ηA)xt
  2. xt+1=xt+1xt+12

This corresponds to doing projected gradient descent on the objective min(xAx) subject to xx=1.

Similarity to Power Iteration

Power Iteration is possibly the most widely used algorithm for computing the max eigenvector. Its update is:

bt+1=AbtAbt=At+1b0At+1b0

The PGD update could be written as:

xt+1=(I+ηA)xt(I+ηA)xt2=(I+ηA)t+1x0(I+ηA)t+1x02

The PGD update is actually just power iteration on I+ηA. So we could try analyzing it like power iteration.

First note that A and I+ηA have the same eigenvectors when η0. The eigenvectors of I+ηA are u1,u2,,un and the corresponding eigenvalues are 1+ηλ11+ηλ21+ηλn. So we know:

  • A and I+ηA have the same eigenvectors
  • PGD on A is just power iteration on I+ηA
  • Power iteration converges to the maximum eigenvector

Hence, PGD will converge to the maximum eigenvector of A.

For a more rigorous proof, consider the following analysis:

If xtuj0 for j1, then xtu1. We have:

ujxt=uj(I+ηA)tx0(I+ηA)tx0=ujni=1(1+ηλi)tuiuix0ni=1(1+ηλi)tuiuix0=(1+ηλj)tujx0ni=1(1+ηλi)tuiuix0=(1+ηλj)tujx0(ni=1(1+ηλi)2t(uix0)2)1/2(1+ηλj1+ηλ1)tuTjx0uT1x0

When j1, xtuj0 so xtu1. Hence, The sequence xtAxt converges to λ1.

Minimum Eigenpair

The minimum eigenvalue of A is the inverse of the largest eigenvalue of A1. We can run power iteration on A1 to compute the minimum eigenvalue. But we must compute A1, which is an overhead.

I proposed PGD on the objective min(xAx) subject to xx=1. This avoids computing the inverse. The update for this will be:

xt+1=(IηA)xt(IηA)xt2=(IηA)t+1x0(IηA)t+1x02

Choose η<1/λ1. Note that A and IηA have the same eigenvectors. The eigenvalues corresponding to u1,u2,,un are 1ηλ11ηλ21ηλn. So power iteration on IηA will converge to un and xtAxt will converge to λn.

To compute the condition number, first find the largest eigenvalue λ1 by using PGD. Then choose η<1/λ1 and compute the smallest eigenvalue λn by using PGD. Compute the ratio λ1/λn.

Written on June 9, 2019