Submodularity and the Lovász extension

This blog is primarily about an easier way of understanding the Lovasz extension of submodular functions.

Submodularity

Submodularity is a property of set functions which comes up very often in several areas of computer science. Many problems in combinatorial optimization and machine learning have submodular structure. Submodularity has two equivalent definitions.

Let [n]={1,2,..,n}[n]={1,2,..,n}. The set of all subsets of [n][n] is 2[n]2[n]. A set function f:2[n]Rf:2[n]R is submodular if for all sets S,T[n]S,T[n] such that TSTS, and for all elements i[n]i[n], we have:

f(T{i})f(T)f(S{i})f(S)f(T{i})f(T)f(S{i})f(S)

This is called as the diminishing returns property. Equivalently, a function f:2[n]Rf:2[n]R is submodular if for all sets S,T[n]S,T[n], we have:

f(S)+f(T)f(ST)+f(ST)f(S)+f(T)f(ST)+f(ST)

Some commonly encountered submodular functions are:

  1. Linear monotone functions : finding min/max spanning trees.
  2. Coverage functions : finding graph/hypergraph covers.
  3. Cut functions : finding min/max cuts of a graph.
  4. Entropy function.

Let Hn={0,1}nHn={0,1}n and Kn=[0,1]nKn=[0,1]n. We can represent a set S[n]S[n] using its characteristic vector XSHnXSHn, which is defined XS(i)=1XS(i)=1 if iSiS and 00 otherwise. For the sake of convenience, we overload the definition of submodular functions as f:HnRf:HnR.

It is commonly said that Submodularity is the “discrete counterpart of convexity”. I feel this is misleading. Submodularity has interesting connections to both convex and concave functions. There are at least three extensions which construct a continuous function on KnKn given a submodular function.

  1. Convex Closure
  2. Concave Closure
  3. Multilinear Extension

We will only talk about the Convex closure. Although a convex closure can be constructed for any set function, not just submodular. For submodular functions, the convex closure is the same as the Lovász extension which is defined below.

Lovász extension

The Lovász extension is a very useful construction that can be used for Submodular minimization. The Lovász extension of a submodular function is always convex and can be minimized efficiently. The minima of the Lovász extension can be used to find the minima of the submodular function.

Given a submodular function f:HnRf:HnR, the Lovasz Extension is the function ˆf:KnR^f:KnR defined as

ˆf(x)=ni=0λif(XSi)^f(x)=ni=0λif(XSi)

where =S0S1S2...Sn=[n]=S0S1S2...Sn=[n] is a chain such that ni=0λiXSi=xni=0λiXSi=x and ni=0λi=1ni=0λi=1.

This definition does not tell us how to find the chain or the coefficients λiλi.

In the hypercube HnHn, there are n!n! unique shortest paths from 0n0n to 1n1n, where 0n0n and 1n1n are the vectors of all 0s and 1s respectively. Let P=[0n=X0,X1,..,1n=Xn]P=[0n=X0,X1,..,1n=Xn] denote a path. Let CPCP be the convex hull of the points on the path PP. There exist n!n! such convex hull, one for each path. Moreover, these convex hulls partition KnKn into n!n! equal parts.

Given a point xKnxKn, it is possible to determine the convex hull CPCP and path P=[0n=X0,X1,..,1n=Xn]P=[0n=X0,X1,..,1n=Xn] such that xCPxCP. We can find the coefficients λiλi such that x=ni=0λiXix=ni=0λiXi, where ni=0λi=1ni=0λi=1 and λi0λi0. The Lovasz Extension ˆf^f at xx is defined to be

ˆf(x)=ni=0λif(Xi)^f(x)=ni=0λif(Xi)

Let xx be the vector (x1,x2,..,xn)(x1,x2,..,xn). Let π:[n][n]π:[n][n] be the sorting permutation of x1,x2,..,xnx1,x2,..,xn. That means if π(i)=jπ(i)=j, then xjxj is the iith largest element in the vector xx. So 1xπ(1)xπ(2)..xπ(n)01xπ(1)xπ(2)..xπ(n)0. Let xπ(0)=1xπ(0)=1 and xπ(n+1)=0xπ(n+1)=0. The λλ’s are the gaps between xπ(0),xπ(1),xπ(2),..,xπ(n),xπ(n+1)xπ(0),xπ(1),xπ(2),..,xπ(n),xπ(n+1).

λi=xπ(i)xπ(i+1)λi=xπ(i)xπ(i+1)

Let X0=0nX0=0n. Then, X1=X0+eπ(1)X1=X0+eπ(1), X2=X1+eπ(2)X2=X1+eπ(2) and so on. Here eπ(i)eπ(i) is the vector with a 1 at position π(i)π(i) and all other elements are 0. In general, we have:

Xi=Xi1+eπ(i)Xi=Xi1+eπ(i)

The summation, ni=0λiXini=0λiXi can be verified to be equal to xx

ni=0λiXi=(1xπ(1))X0+ni=1(xπ(i)xπ(i+1))(Xi1+eπ(i))=ni=1eπ(i)(nj=ixπ(j)xπ(j+1))=ni=1eπ(i)xπ(i)=x

Because the Lovász extension at a pont x is defined as a convex combination of X0,X1,..,Xn, it is a piecewise linear function, which is continuous but not necessarily differentiable. We can define sub-gradients for the Lovász extension.

g(x)=ni=1(f(Xi)f(Xi1))eπ(i)
Written on May 19, 2018