Hedge or: How to do well in an exam with 0 preparation?

Assume you are a grad student taking two courses this semester. One of which is Machine Learning Theory, which is your favorite course. You don’t like the other course so much, so you never study for it. You still have to do well in this course’s exam, so you decide to use your ML theory knowledge to get through it.

The exam proceeds as follows: It has \(T\) questions, each of which has 4 options (A,B, C and D). After choosing an option for a question, you get to know the correct answer and the next question comes up. If you get the answer right, you get \(1\) point. Otherwise you get \(0\) points.

Since you didn’t study for the exam, the questions and the options don’t make any sense to you. All you can do is choose an option according to some algorithm and hope for the best.

One of the four choices (A,B, C and D) will be the most popular one. You don’t know which one apriori. If you did, you can choose that option for all \(T\) questions and score at least \(T/4\) points. Your goal is to be competitive with the best choice in hindsight. The correct option for each question is chosen by the professor adversarially. Assume you always choose option A, then the professor can see this and always make one of B,C,D as the correct option. Hence you have to randomize your strategy to choose options.

Let the reward function for question \(t\) be \(r_t\), the correct option be \(a^\star_t\) and the choice you made be \(a_t\). Then \(r_t(a) = \mathbb{1}\{a=a^\star_t\}\).

Your goal is to maximize the cumulative reward:

\[\sum_{t=1}^T r_t(a_t)\]

This is equivalent to minimizing the regret, which is defined as follows:

\[R_T = \max_{a \in \{A,B,C,D\}} \sum_{t=1}^T r_t(a) - \sum_{t=1}^T r_t(a_t)\]

Since you studied for the ML theory course, you know it looks like the Hedge algorithm can be applied, so you decide to use it. The algorithm is as follows:

  1. Initialize \(w^{(1)}_a = 1\) for \(a \in \{A,B, C, D\}\).
  2. For \(t=1\) to \(T\):
    1. Let \(Z_t = \sum_{a \in \{A,B, C, D\}} w^{(t)}_a\) and \(p^{(t)}_a = w^{(t)}_a/Z_t\). Choose \(a_t = a\) with probability \(p^{(t)}_a\).
    2. See the reward function \(r_t\).
    3. Update \(w^{(t+1)}_a = w^{(t)}_a \exp(\eta r_t(a))\).

Here \(\eta\) is a parameter which we need to choose. Since this is a randomized algorithm, we analyse the expected regret.

\[\begin{align} \mathbb{E}[R_T] &= \max_{a \in \{A,B,C,D\}} \sum_{t=1}^T r_t(a) - \mathbb{E}[\sum_{t=1}^T r_t(a_t)] \\ &= \max_{a \in \{A,B,C,D\}} \sum_{t=1}^T r_t(a) - \sum_{t=1}^T \langle p^{(t)},r_t \rangle \end{align}\]

Here \(\langle p^{(t)},r_t \rangle = \sum_{a \in \{A,B,C,D\}} p^{(t)}_a,r_t(a)\).

Analysis:

I follow the analysis of Hedge as covered in my ML Theory class taught by Prof. Akshay Krishnamurthy.

Consider \(\log \frac{Z_{t+1}}{Z_t}\).

\[\begin{align} \log \frac{Z_{t+1}}{Z_t} &= \log \sum_{a \in \{A,B,C,D\}} \frac{w^{(t)}_a \exp(\eta r_t(a))}{Z_t}\\ &= \log \sum_{a \in \{A,B,C,D\}} p^{(t)}_a \exp(\eta r_t(a))\\ &\leq \log \bigg( 1 + \eta\langle p^{(t)},r_t \rangle + \frac{\eta^2}{2} \langle p^{(t)},r_t^2 \rangle \bigg) & [\because \exp(x) \leq 1+x+\frac{x^2}{2}]\\ &\leq \eta\langle p^{(t)},r_t \rangle + \frac{\eta^2}{2} \langle p^{(t)},r_t^2 \rangle & [\because \log(1+x) \leq x]\\ &\leq \eta\langle p^{(t)},r_t \rangle + \frac{\eta^2}{2} \end{align}\]

Upper bounding \(\log Z_{T+1}\)

\[\begin{align} \log Z_{T+1} &= \sum_{t=1}^T \log \frac{Z_{t+1}}{Z_t} + \log Z_1\\ &\leq \eta\sum_{t=1}^T\langle p^{(t)},r_t \rangle + \frac{T\eta^2}{2} + \log 4 \end{align}\]

Lower bounding \(\log Z_{T+1}\)

\[\begin{align} \log Z_{T+1} &= \log \sum_{a \in \{A,B,C,D\}} \exp(\eta \sum_{t=1}^T r_t(a))\\ &\geq \max_{a \in \{A,B,C,D\}} \eta \sum_{t=1}^T r_t(a) \end{align}\]

Combining the two:

\[\begin{align} \max_{a \in \{A,B,C,D\}} \eta \sum_{t=1}^T r_t(a) \leq \eta\sum_{t=1}^T\langle p^{(t)},r_t \rangle + \frac{T\eta^2}{2} + \log 4\\ \max_{a \in \{A,B,C,D\}} \sum_{t=1}^T r_t(a) - \sum_{t=1}^T\langle p^{(t)},r_t \rangle \leq \frac{T\eta}{2} + \frac{\log 4}{\eta} \end{align}\]

Optimizing over the choice of \(\eta\), we get \(\eta = \sqrt{\frac{2 \log 4}{T}}\) and the expected regret \(\mathbb{E}[R_T] \leq \sqrt{2T \log(4)}\).

In general, the regret inequality for \(N\) options can be stated as:

\[\max_{a \in \{A,B,C,D\}} \sum_{t=1}^T r_t(a) - \sum_{t=1}^T\langle p^{(t)},r_t(a) \rangle \leq \frac{\eta}{2} \sum_{t=1}^T \langle p^{(t)},r_t(a)^2\rangle + \frac{\log N}{\eta} \leq \frac{T\eta}{2} + \frac{\log N}{\eta}\]

Choosing \(\eta = \sqrt{\frac{2 \log N}{T}}\), we get that \(\mathbb{E}[R_T] \leq \sqrt{2T\log N}\)

If you choose options according to Hedge, your score is very close to the score of the best (most popular) option in hindsight even when the professor chooses options adversarially. Also, since the regret is sublinear, the average expected regret \(\frac{\mathbb{E}[R_T]}{T} \to 0\) as \(T \to \infty\).

Written on January 1, 2018