Online Mirror Descent
In this blog post, we introduce OMD and state an upperbound for its regret. Using this upperbound, we analyze the regret of the algorithm we proposed in our previous blog.
OMD can also be thought of as a game between a player and an adversary which proceeds as follows:
- For t=1...Tt=1...T
- Player chooses a point xtxt from a convex set KK
- The adversary simultaneously reveals a convex function ftft to the player.
- The player suffers the loss ft(xt)ft(xt)
Like before, the goal of the player is to minimize the regret:
T∑t=1ft(xt)−minx⋆∈KT∑t=1ft(x⋆)T∑t=1ft(xt)−minx⋆∈KT∑t=1ft(x⋆)To understand OMD, we need to introduce Bregman Divergences and Fenchel Conjugates.
Bregman Divergence
Let R(x)R(x) be a convex function. The Bregman Divergence DR(x‖y)DR(x∥y) is:
DR(x‖y)=R(x)−R(y)−⟨∇R(y),x−y⟩DR(x∥y)=R(x)−R(y)−⟨∇R(y),x−y⟩In words, DR(x‖y)DR(x∥y) is the difference between R(x)R(x) and the first order approximation of R(x)R(x) from yy, which is R(y)+⟨∇R(y),x−y⟩R(y)+⟨∇R(y),x−y⟩. As a consequence of convexity, DR(x‖y)≥0DR(x∥y)≥0.
Fenchel Conjugate
The Fenchel Conjugate of R(x)R(x) is defined as:
R⋆(θ)=maxx⟨x,θ⟩−R(x)R⋆(θ)=maxx⟨x,θ⟩−R(x)From its definition, we have R⋆(θ)+R(x)≥⟨x,θ⟩R⋆(θ)+R(x)≥⟨x,θ⟩, which is the Fenchel-Young inequality.
Online Mirror Descent
The OMD algorithm uses a convex regularization function R(x)R(x). The algorithm is as follows:
- The player picks y1y1 such that ∇R(y1)=0∇R(y1)=0 and x1=argminx∈KDR(x‖y1)x1=argminx∈KDR(x∥y1)
- For t=1...Tt=1...T
- Player chooses the point xtxt
- The player sees ftft and suffers the loss ft(xt)ft(xt)
- Update yt+1yt+1 ∇R(yt+1)=∇R(xt)−η∇ft(xt)∇R(yt+1)=∇R(xt)−η∇ft(xt)
- Project back to KK xt+1=argminx∈KDR(x‖yt+1)xt+1=argminx∈KDR(x∥yt+1)
We state the regret inequality for OMD, as derived in Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems
T∑t=1ft(xt)−T∑t=1ft(x)≤R(x)−R(x1)η+1ηT∑t=1DR⋆(∇R(xt)−η∇ft(xt)‖∇R(xt))T∑t=1ft(xt)−T∑t=1ft(x)≤R(x)−R(x1)η+1ηT∑t=1DR⋆(∇R(xt)−η∇ft(xt)∥∇R(xt))Online Combinatorial Optimization
We apply OMD to the OCO problem from the previous blog post. Here ft(x)=⟨lt,x⟩ft(x)=⟨lt,x⟩, so ∇ft(x)=lt∇ft(x)=lt. We use the following regularizer:
R(x)=n∑i=1xilogxi+(1−xi)log(1−xi)R(x)=n∑i=1xilogxi+(1−xi)log(1−xi)Its gradient is:
∇R(x)i=logxi1−xi∇R(x)i=logxi1−xiHence, we have the following update:
logyt+1,i1−yt+1,i=logxt,i1−xt,i−ηlt,iyt+1,i1−yt+1,i=xt,i1−xt,iexp(−ηlt,i))yt+1,i=xt,ixt,i+(1−xt,i)exp(ηlt,i)logyt+1,i1−yt+1,i=logxt,i1−xt,i−ηlt,iyt+1,i1−yt+1,i=xt,i1−xt,iexp(−ηlt,i))yt+1,i=xt,ixt,i+(1−xt,i)exp(ηlt,i)Since yt,iyt,i is always in [0,1][0,1], we can skip the Bregman projection step. Hence, we have:
xt+1,i=xt,ixt,i+(1−xt,i)exp(ηlt,i)=11+exp(η∑tτ=1lτ,i)xt+1,i=xt,ixt,i+(1−xt,i)exp(ηlt,i)=11+exp(η∑tτ=1lτ,i)This is the same update we had used in our algorithm to do Hedge in linear time. Now we analyze the expected regret of our algorithm using OMD’s regret:
E[RT]=E[T∑t=1Lt(Xt)]−minX⋆∈{0,1}nT∑t=1Lt(X⋆)=E[T∑t=1⟨lt,Xt⟩]−minX⋆∈{0,1}nT∑t=1⟨lt,X⋆⟩Recall that Pr(Xt=X)=∏ni=1(xt,i)Xi(1−xt,i)1−Xi. Hence, we have E[∑Tt=1⟨lt,Xt⟩]=∑Tt=1⟨lt,xt⟩. Using OMD’s regret inequality, we get:
E[RT]=T∑t=1⟨lt,xt⟩−minX⋆∈{0,1}nT∑t=1⟨lt,X⋆⟩≤R(X⋆)−R(x1)η+1ηT∑t=1DR⋆(∇R(xt)−ηlt‖∇R(xt))The first term can be bounded very easily R(X⋆)−R(x1)≤nlog2. The second term is a little involved:
DR⋆(∇R(xt)−ηlt‖∇R(xt))=DR⋆(−ηt∑τ=1lτ‖−ηt−1∑τ=1lτ)=R⋆(−ηt∑τ=1lτ)−R⋆(−ηt−1∑τ=1lτ)+⟨∇R⋆(−ηt−1∑τ=1lτ),ηlt⟩The Fenchel Conjugate of R(x) is:
R⋆(θ)=maxx∈[0,1]n⟨x,θ⟩−R(x)Differentiating ⟨x,θ⟩−R(x) wrt xi and equating to 0,
θi−logxi+log(1−xi)=0xi1−xi=exp(θi)xi=exp(θi)1+exp(θi)Substituting this back in the expression for R⋆(θ), we get:
R⋆(θ)=n∑i=1log(1+exp(θ))Its gradient is:
∇R⋆(θ)i=exp(θi)1+exp(θi)We can simplify ⟨∇R⋆(−η∑t−1τ=1lτ),ηlt⟩ by observing that:
∇R⋆(−ηt−1∑τ=1lτ)i=11+exp(η∑t−1τ=1lτ,i)=xt,iSimplifying the Bregman term:
DR⋆(∇R(xt)−ηlt‖∇R(xt))=(n∑i=1log1+exp(−η∑tτ=1lτ,i)1+exp(−η∑t−1τ=1lτ,i))+η⟨xt,lt⟩=(n∑i=1log1+exp(η∑tτ=1lτ,i)exp(ηlt,i)(1+exp(η∑t−1τ=1lτ,i)))+η⟨xt,lt⟩Using the fact that xt,i=(1+exp(η∑t−1τ=1lτ,i))−1,
=(n∑i=1logxt,i+(1−xt,i)exp(ηlt,i)exp(ηlt,i))+η⟨xt,lt⟩=n∑i=1log(1−xt,i+xt,iexp(−ηlt,i))+η⟨xt,lt⟩Using the inequality: e−a≤1−a+a2
≤n∑i=1log(1−ηxt,ilt,i+η2xt,il2t,i)+η⟨xt,lt⟩Using the inequality: log(1−a)≤−a
≤−η⟨xt,lt⟩+η2⟨xt,l2t⟩+η⟨xt,lt⟩=η2⟨xt,l2t⟩The second term is η∑Tt=1⟨xt,l2t⟩. So the regret inequality is:
E[RT]≤nlog2η+ηT∑t=1⟨xt,l2t⟩Since lt,i∈[−1,+1] and xt∈[0,1]n, we get E[RT]≤nlog2η+ηTn. Substituting η=√log2T, we get E[RT]≤2n√Tlog2.
In the previous blog post, we showed that Hedge and our algorithm are equivalent. In this post, we showed that OMD with Entripic regularization and our algorithm are equivalent. Hence, in this particular case, Hedge, OMD+Entripic regularization and our algorithm are the same and have expected regret bounded by O(n√T). This is because our decision space was all 2n points in {0,1}n and so, we were able to factorize Hedge’s update and sampling steps. In general, this is not always possible. For instance, if our decision space was Sm={X:‖X‖1≤m,X∈{0,1}n}. We would not have been able to factorize Hedge’s update like we did in our algorithm.