Euclidean Distance Matrix Completion has No Spurious Local Minima?
Let \(X\) be a \(n\times k\) real matrix. It can be interpreted as \(n\) points, \(x_1,x_2,..x_n \in \mathbb{R}^k\). The Euclidean Distance Matrix(EDM) \(D\) is \([D]_{i,j} = \|x_i-x_j\|_2^2\). In the EDM completion problem, we are given some entries of \(D\) which are randomly sampled according to a probability \(p\). The goal is to determine the remaining entries of \(D\) and additionally the point matrix \(X\) producing \(D\). This is similar to the generic Low rank matrix completion problem, except on EDMs.
Consider the following function on square matrices:
\[\mathcal{K}(A) = \text{diag}(A)\mathbb{1} + \mathbb{1}\text{diag}(A) - A - A^T\]Here \(\text{diag}(A)\) is the diagonal matrix consisting of the diagonal entries of \(A\) and \(\mathbb{1}\) is the \(n\times n\) matrix of all \(1\)s. It is easy to see that:
\[D = \mathcal{K}(XX^T) = \text{diag}(XX^T)\mathbb{1} + \mathbb{1}\text{diag}(XX^T) -2 XX^T\]EDM completion
Let \(D^*\) be the EDM produced by an unknown \(X^* \in \mathbb{R}^{n\times k}\), ie, \(D^* = \mathcal{K}(X^*{X^*}^T)\). Let \(\Omega\) be the binary matrix formed by sampling from \(\{1,0\}\) where \(\Pr[\Omega_{i,j}=1] = p\). We focus on the non-convex formulation of the EDM completion problem, where we search for \(X \in \mathbb{R}^{n\times k}\) which minimizes the following function.
\[f(X) = \frac{1}{2p}\|\Omega \circ (D^* - \mathcal{K}(XX^T))\|_F^2\]There is an alternative way of solving the EDM completion problem using Semidefinite programming. The major advantage of the semidefinite formulation is that the problem becomes convex: where there is one local solution, which is also global. The disadvantage here is that we cannot control the rank of the solution we get and the search space is large (\(O(n^2)\) variables), which becomes prohibitive when solving even moderately sized problems
The advantage of the non convex formulation is that we can control the rank of the solution and the search space is \(O(nk)\) variables. The apparent downside is that the problem is non-convex: we may not find the global minima using local search methods.
But is the non convex problem really unsolvable? I say no.
Does EDM completion have Strict Saddle property?
This section is motivated from two papers in ICML 2017:
- No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis
- How to Escape Saddle Points Efficiently
Also, the following blog posts from Off the convex path:
For functions which have strict saddles, there exists at least one direction along which the curvature is strictly negative at saddle points. This gives gradient-based algorithms a chance of escaping saddle points. In fact, Perturbed gradient descent(PGD) can escape saddle points in polynomial time. Several non-convex problems have been proved to have the strict saddle property. These include, principal components analysis, canonical correlation analysis, orthogonal tensor decomposition, phase retrieval, dictionary learning, matrix sensing, matrix completion etc. The No Spurious Local Minima paper provides a unified framework to prove that these problems have strict saddles and also proves that in these problems, all local minima are global minima, ie, there are no spurious local minima.
Let’s consider the Positive Semidefinite Matrix(PSD) completion problem. Let \(M^*\) be the rank \(k\) gram matrix produced by an unknown \(X^* \in \mathbb{R}^{n\times k}\), ie, \(M^* = X^*{X^*}^T\). Let \(\Omega\) be the binary matrix formed by sampling from \(\{1,0\}\) where \(\Pr[\Omega_{i,j}=1] = p\). PSD matrix completion can be solved by finding an \(X \in \mathbb{R}^{n\times k}\) which minimizes the following function.
\[f(X) = \frac{1}{2p}\|\Omega \circ (M^* - XX^T)\|_F^2\]It has been proved that PSD matrix completion( under the assumption of incoherence) has the strict saddle property and all local minima are global minima. Hence, even simple algorithms like stochastic gradient descent can reach the global minimum, despite the function being non-convex.
The EDM completion problem is quite close to the PSD completion problem. I have run EDM completion using gradient descent on several thousands of instances, and they have always converged to the global minima. I have also tried proving that EDM completion has no spurious local minima. Although I strongly believe that this is true, I have not been able to prove it. Here is a rough draft of where my proof stands and a document outlining the Similarities between Euclidean Distance Matrix Completion and PSD Matrix Completion. Although most of the results from the No Spurious Local Minima paper follow through to EDMC, two crucial lemmas don’t. The difficulty mainly arises due to the \(\text{diag}(A)\mathbb{1} + \mathbb{1}\text{diag}(A)\) terms in \(\mathcal{K}\).
I’m also excited to see a paper titled “A New Theory for Nonconvex Matrix Completion” in the list of accepted papers in NIPS 2017. Probably this new theory might help me in proving the result that I want. If you have any suggestions for proving that EDM completion has no spurious local minima, please drop me an email.