Projected Gradient Descent - Max(Min) Eigenvalues(vectors)

This post is about finding the minimum and maximum eigenvalues and the corresponding eigenvectors of a matrix \(A\) using Projected Gradient Descent. There are several algorithms to find the eigenvalues of a given matrix (See Eigenvalue algorithms). Although I could not find Projected Gradient Descent in this list, I feel it is a simple and intuitive algorithm.

Let \(A\) be a \(n\times n\) square matrix. Let \(\lambda_{max}\) and \(\lambda_{min}\) be its largest and smallest eigenvalues. \(x_{min}\) and \(x_{max}\) are the corresponding eigenvectors having unit norm. Then the eigenvalues are the extreme values of \(\{x^TAx:x^Tx=1\}\) , which are obtained at the corresponding eigenvectors.

\[\lambda_{max} = \max \limits_{x^Tx=1} x^TAx \quad \text{and} \quad x_{max} = \arg \max \limits_{x^Tx=1} x^TAx\] \[\lambda_{min} = \min \limits_{x^Tx=1} x^TAx \quad \text{and} \quad x_{min} = \arg \min \limits_{x^Tx=1} x^TAx\]

To see why this is so, observe the Lagrangian (because it is a constrained optimization problem).

\[L(x,\lambda) = x^TAx + \lambda(1-x^Tx)\] \[\nabla_x L(x,\lambda) = 2(Ax - \lambda x) =0\]

The solutions of \(Ax=\lambda x\) are the eigenvectors of \(A\) which lie on the unit ball. Hence, \(x^TAx = \lambda_x x^Tx = \lambda_x\) where \(\lambda_x\) is the eigenvalue corresponding to eigenvector \(x\). So the extreme values are \(\lambda_{max}\) and \(\lambda_{min}\).

NOTE: \(x^TAx\) is a scalar. So \(x^TAx = x^TA^Tx = x^T\frac{A+A^T}{2}x\). We can assume \(A\) is symmetric as only the symmetric part contributes to \(x^TAx\).

Projected Gradient Descent

Let \(f:x\to \mathbb{R}\) be a function. When there are no restrictions on \(x\), simple gradient descent can be used to find a minima of \(f\). Gradient descent moves in the direction of the negative gradient using step size \(\alpha_k\).

\[\begin{align} &\text{Find Minimum}( f ): \\ &\text{Initialise }x_0 \text{ randomly}\\ &\text{For }k=0...: x_{k+1} = x_{k} - \alpha_k\nabla_xf(x_k) \end{align}\]

When \(x\) is constrained to be in a set \(C\), Projected gradient descent can be used to find the minima of \(f\). Projected gradient descent moves in the direction of the negative gradient and then projects on to the set \(C\).

\[\begin{align} &\text{Find Minimum Constrained}( f , C): \\ &\text{Initialise }x_0 \text{ randomly from } C\\ &\text{For }k=0,...: x_{k+1} = \Pi_C(x_{k} - \alpha_k\nabla_xf(x_k)) \end{align}\]

Here \(\Pi_C\) is the projection operation, defined as \(\Pi_C(x) = \arg \min \limits_{y \in C}\|y-x\|_2\). It finds the point in \(C\) which is closest to \(x\).

The largest eigenvalue of A can be found by solving the constrained optimization problem:

\[\min \limits_{x^Tx=1} -x^TAx\]

Here \(\nabla_x(-x^TAx) = -2Ax\) and the projection operation projects onto the unit ball, ie, \(\Pi_C(x) = x/\|x\|_2\).

\[\begin{align} x'_{k+1} &= x_{k} + 2\alpha_kAx_k\\ x_{k+1} &= \frac{x'_{k+1}}{\|x'_{k+1}\|_2} \end{align}\]

To find the minimum eigenvalue, use \(x'_{k+1} = x_{k} - 2\alpha_kAx_k\).

We can also find the spectral norm of \(A\) using projected gradient descent.

\[\|A\|_2^2 = \max \limits_{x^Tx=1} \|Ax\|_2^2 = \max \limits_{x^Tx=1} x^TA^TAx\]

This also shows that the spectral norm of \(A\) is the square root of the largest eigenvalue of \(A^TA\).

A note on Convergence and Efficiency:

Each step of Projected gradient descent requires \(O(n^2)\) operations and rate of convergence is linear. The exact rate depends on the condition number of \(A\) (Rate depends on the Hessian of \(x^TAx\), which is \(2A\)). This makes it atleast as good as Power Iteration. Power Iteration is used in computing Page Ranks, so it must be efficient and scalable. The downside of power iteration is that it can only give you the largest eigenpair.

Why does this work/does not work?

After some research and talking to a few colleagues at my lab, I realized the following things:

  1. \(x^TAx\) is convex iff \(A\) is PSD
  2. Even if \(A\) is PSD, \(x^TAx\) might not be convex on the sphere.
  3. If at some point you reach an eigenvector which is not the eigenvector you are looking for, you cannot escape from it.

I found the field of Optimization on Reimann Manifolds very interesting. Think about optimizing a function, where the parameters are constrained to lie on the surface of a high dimensional object(like a sphere). A lot of important constrained optimization problems fall into this domain. This area has a lot of literature in math, not a lot in CS. These papers may be of interest to get started in this area:

  1. Concepts and techniques of optimization on the sphere
  2. Optimization Techniques on Riemannian Manifolds
  3. Optimization Algorithms on Matrix Manifolds
Written on June 28, 2017