From Potentials to Probabilities
This post is about “Potential Functions” in the context of Online Learning. These form the basis for the Implicitly Normalized Forecaster(INF) studied in [1,2]. We present some basic results about potential functions here.
A function ψ:(−∞,a)→(0,+∞) for some a∈R∪{+∞} is called a potential if it is convex, continuously differentiable and satisfies:
limx→−∞ψ(x)=0limx→aψ(x)=+∞ψ′>0∫10|ψ−1(s)|ds<+∞For instance, exp(x) is a potential function with a=∞ and −1/x is a potential with a=0. A potential function usually looks like this:
Consider a function g:Rn×R→R+ defined as
g(y,λ)=n∑i=1ψ(yi+λ)Lemma 1: For every y∈Rn, there exists a unique λ such that yi+λ<a for all i and g(y,λ)=1
Proof: For every y∈Rn, we have that limλ→−∞g(y,λ)=0 and limλ→a−mini(yi)g(y,λ)=+∞. As g is monotonically increasing and continuous, by the intermediate value theorem, for every y∈Rn there exists a unique λ such that g(y,λ)=1.
Using Lemma 1, we can define a function λ(y) such that:
g(y,λ(y))=n∑i=1ψ(yi+λ(y))=1Q.E.D
Since ψ(yi+λ(y))≥0 and ∑ni=1ψ(yi+λ(y))=1, we can see that P(y)={ψ(yi+λ(y))}ni=1 forms a probability distribution.
These probability distributions defined through ψ come up in online learning algorithms such as OMD and FTRL when the decision space is the probability simplex. A generalized Implicitly Normalized Forecaster(INF) algorithm can be written as:
Generalized INF Given a potential ψ and “Loss Aggregators” h1,h2,…,hT
- Let θ0=0
- For t=1…T:
- Play the point xt=P(θt−1)=ψ(θt−1+λ(θt−1))
- Observe the loss function lt(x)
- Compute θt=ht(l1,…,lt)
The loss aggregators collect the losses seen so far and output a vector, using which we compute the point to play.
Considering linear losses, the aggregators for time varying OMD are given by:
ht(l1,…,lt)=−t∑s=1ηslsThe aggregators for time varying FTRL are given by:
ht(l1,…,lt)=−ηtt∑s=1lsWe will treat OMD and FTRL in future posts. In the remaining post, we present some properties of potential functions.
Constant Translation
Lemma 2: Let y,y′∈Rn. If y′i=yi+c for all i, then λ(y′)=λ(y)−c.
Proof: Observe that
g(y,c+λ(y′))=n∑i=1ψ(yi+c+λ(y′))=n∑i=1ψ(y′i+λ(y′))=g(y′,λ(y′))=1Since λ(y) is the unique solution to g(y,λ)=1, we must have that λ(y)=c+λ(y′).
Q.E.D
Associated Function
Associated with a potential ψ, we define a function f:(0,+∞)→R as
f(z)=∫z0ψ−1(s)dsObserve that f′(z)=ψ−1(z) and f′(z)=[ψ′(ψ−1(z))]−1. So limz→0∣ψ−1(z)∣=+∞ and f is strictly convex, proving that f is a Legendre function on (0,∞). The Legendre-Fenchel dual of f is f⋆:(−∞,a)→R defined as f⋆(u)=supz>0zu−f(z), which by definition of f is same as:
f⋆(u)=uψ(u)−f(ψ(u))The following relations hold:
f′(z)=ψ−1(z)f⋆′(u)=ψ(u)f″(z)=[ψ′(ψ−1(z))]−1f⋆″(u)=ψ′(u)The INF algorithm using potential ψ along with suitable aggregators will be equivalent to OMD/FTRL on the simplex with regularizer F(x)=∑ni=1f(xi).
Concavity of λ(y)
Taking the gradient of ∑ni=1ψ(yi+λ(y))=1, we have:
n∑i=1ψ′(yi+λ(y))(ei+∇λ(y))=0⟹∇λ(y)=−∑ni=1ψ′(yi+λ(y))ei∑ni=1ψ′(yi+λ(y))∇λ(y) is the vector {−ψ′(yi+λ(y))∑ni=1ψ′(yi+λ(y))}ni=1.
Taking the gradient of ∑ni=1ψ′(yi+λ(y))(ei+∇λ(y))=0, we have:
n∑i=1ψ″(yi+λ(y))(ei+∇λ(y))⊗2+∇2λ(y)(n∑i=1ψ′(yi+λ(y)))=0⟹∇2λ(y)=−∑ni=1ψ″(yi+λ(y))(ei+∇λ(y))⊗2∑ni=1ψ′(yi+λ(y))The above Hessian is negative definite, so λ(y) is strictly concave.
Convexity of D(y)
Define the function:
D(y)=−λ(y)+n∑i=1f⋆(yi+λ(y))The gradient of D is:
∇D(y)=−∇λ(y)+n∑i=1ψ(yi+λ(y))(ei+∇λ(y))=−∇λ(y)+∇λ(y)+n∑i=1ψ(yi+λ(y))ei=P(y)The Hessian of D is:
∇2D(y)=−∇2λ(y)+n∑i=1ψ′(yi+λ(y))(ei+∇λ(y))⊗2+n∑i=1ψ(yi+λ(y))∇2λ(y)=n∑i=1ψ′(yi+λ(y))(ei+∇λ(y))⊗2So D(y) is strictly convex.
Refernces
[1] Audibert, J. Y., Bubeck, S., & Lugosi, G. (2011, December). Minimax policies for combinatorial prediction games. In Proceedings of the 24th Annual Conference on Learning Theory (pp. 107-132). JMLR Workshop and Conference Proceedings.
[2] Audibert, J. Y., Bubeck, S., & Lugosi, G. (2014). Regret in online combinatorial optimization. Mathematics of Operations Research, 39(1), 31-45.
Parts of this post appear in my paper : Putta, S. R., & Agrawal, S. (2021). Scale Free Adversarial Multi Armed Bandits. arXiv preprint arXiv:2106.04700.