Online Portfolio Selection - Cover's Universal Portfolio

In my last blog, I mentioned that a desirable feature of an online portfolio selection algorithm is minimax regret optimality, i.e., we want an algorithm \(Alg^\star\) satisfying:

\[Alg^\star = \arg\min_{Alg} \max_{[r]_1^T} \mathcal{R}_T(Alg, [r]_1^T)\]

\(Alg^\star\) would be the best possible algorithm against any sequence of returns \(r_1,\dots,r_T\). Cover’s Universal Portfolio algorithm is precisely one such candidate for \(Alg^\star\). It was introduced by Thomas M. Cover in a pioneering paper called Universal Portfolios. This was before the theory of online learning and regret analysis. So, the article itself does not use the term regret.

Cover’s Algorithm

Cover’s idea is to assign a non-negative weight \(w_{t-1}(x)\) to every point \(x \in \Delta_n\), based on the returns \(r_1,\dots, r_{t-1}\). Then pick the point \(x_t\) as a weighted average of the points in \(\Delta_n\). Since this is a convex set, the weighted average is:

\[x_t = \frac{\int_{x \in \Delta_n} x w_{t-1}(x)dx}{\int_{x \in \Delta_n} w_{t-1}(x) dx}\]

Intuitively, if a point \(x\) produces high returns, it should be weighted more heavily. Cover’s choice for weight function is the simplest possible one. He picks:

\[w_{t-1}(x) = \prod_{i=1}^{t-1}(r_i^\top x)\]

So, Cover states his algorithm as follows:

\[x_{t} = \frac{\int_{x \in \Delta_n} x \prod_{i=1}^{t-1}(r_i^\top x) dx}{\int_{x \in \Delta_n} \prod_{i=1}^{t-1}(r_i^\top x) dx}\]

Another way to state the same algorithm is by considering the density \(p_{t-1}(x) \propto \prod_{i=1}^{t-1} (r_i^\top x)\). In closed form, this would be:

\[p_{t-1}(x) =\frac{ \prod_{i=1}^{t-1}(r_i^\top x) }{\int_{x \in \Delta_n} \prod_{i=1}^{t-1}(r_i^\top x) dx} = \frac{w_{t-1}(x)}{\int_{x \in \Delta_n} w_{t-1}(x) dx}\]

Then \(x_t\) is simply the expectation:

\[x_{t} = \int_{x \in \Delta_n} x p_{t-1}(x) dx = \mathbb{E}_{p_{t-1}}[X]\]

Interpretation as Continuous Exponential Weights

The Hedge algorithm due to Freund and Schapire, also known as Exponential Weights, is one of the most influential algorithms in Online Learning. It has applications in Machine Learning, Boosting, Bandits, Game Playing, solving Linear Programs, Approximation Algorithms, and many more. See the survey by Arora, Hazan, and Kale for more details.

The weights for Cover’s algorithm can be recast as exponential weights using a simple trick:

\[w_{t-1}(x) = \prod_{i=1}^{t-1}(r_i^\top x) = \exp\left(- \sum_{i=1}^{t-1} -\log(r_i^\top x)\right ) = \exp\left(-\sum_{i=1}^{t-1}f_i(x)\right)\]

For convenience, we define the convex loss function \(f_i(x) = -\log(r_i^\top x)\). The density \(p_{t-1}(x)\) is simply the exponential weights density:

\[p_{t-1}(x) =\frac{ \exp\left(-\sum_{i=1}^{t-1}f_i(x)\right)}{\int_{x \in \Delta_n} \exp\left(-\sum_{i=1}^{t-1}f_i(x)\right) dx}\]

Since Exponential Weights is a form of Online Mirror Descent/Follow the Regularized Leader with Negative Entropy regularizer, the density \(P_{t-1}\) can be expressed as the solution to the optimization problem:

\[\begin{align*} p_{t-1} = \arg\max_{p \in \mu(\Delta_n)} H(p) - \sum_{i=1}^{t-1} \langle f_i, p\rangle \end{align*}\]

Here \(\mu(\Delta_n)\) indicates probability measures on \(\Delta_n\) and \(H(p)\) is the differential entropy of \(p\):

\[H(p) = \mathbb{E}_p[-\log(p(X))] = -\int_{x\in \Delta_n} p(x) \log p(x) dx\]

\(\langle f_i, p\rangle\) is short form for:

\[\langle f_i, p\rangle = \mathbb{E}_p[f(X)] = \int_{x\in \Delta_n} f_i(x) p(x) dx\]

Rewriting it in a more familiar minimization form, we have:

\[\begin{align*} p_{t-1} &= \arg\min_{p \in \mu(\Delta_n)} \sum_{i=1}^{t-1} \langle f_i, p\rangle - H(p)\\ &= \arg\min_{p \in \mu(\Delta_n)} \mathbb{E}_p\left[ \sum_{i=1}^{t-1}f_i(X) + \log(p(X))\right] \end{align*}\]

Interpretation as a Bayesian Update

Define the liklihood \(L(r\mid x) \propto (r^\top x)\) and \(p_0(x)\) as the uniform distribution over \(\Delta_n\). Then, the density \(p_{1}(x)\) can be interpreted as the bayesian update:

\[p_{1}(x) = p(x \mid r_1) \propto L(r_{1} \mid x)p_0(x) \propto (r_1^\top x)\]

Similarly, \(p_2(x)\) is:

\[p_{2}(x) = p_1(x\mid r_2) \propto L(r_{2}\mid x)p_1(x) \propto (r_1^\top x)(r_2^\top x)\]

Therefore, we have:

\[p_{t}(x) = p_{t-1}(x\mid r_{t}) \propto L(r_{t}\mid x) p_{t-1}(x) \propto \prod_{i=1}^{t} (r_i ^\top x)\]

Running Cover’s Algorithm

Computing \(x_t\) exactly involves evaluting two \(n\)-dimensional integrals \(\int_{x \in \Delta_n} x \prod_{i=1}^{t-1}(r_i^\top x) dx\) and \(\int_{x \in \Delta_n} \prod_{i=1}^{t-1}(r_i^\top x) dx\) and finding their ratio.

Case of two stocks

Consider the case of just two stocks. If we invest \(x\) proportion of wealth in stock 1, then stock 2 naturally gets \(1-x\) proportion of wealth. So, in this simple case, the return of portfolio x on the return vector \((r[1],r[2])\) is given by:

\[r[1]x + r[2](1-x) = r[2] + x(r[1]-r[2])\]

So, we can compute the following:

\[x_{t} = \frac{\int_{0}^1 x \prod_{i=1}^{t-1} (r_i[2] + x(r_i[1]-r_i[2]))dx}{\int_{0}^1 \prod_{i=1}^{t-1} (r_i[2] + x(r_i[1]-r_i[2]))dx}\]

Cover’s portfolio vector is \((x_t,1-x_t)\). The integrands in the numerator and denominator are polynomials of degree \(t\) and \(t-1\), respectively. They can be computed using an off-the-shelf polynomial library (like numpy.polynomial). In the experiments section, I show how to implement this.

General case

In higher dimensions, computing \(x_t\) exactly would be prohibitively expensive. The only tractable route uses Markov Chain Monte Carlo techniques to sample from \(p_{t-1}(x)\). Then approximate the expectation \(\mathbb{E}_{p_{t-1}}[X] = x_{t-1}\) to compute the portfolio.

We previously noted that the Cover’s density has a particular form.

\[p_{t-1}(x) \propto \exp\left(-\sum_{i=1}^{t-1}f_i(x)\right)\]

Where \(f_t(x) = - \log(r_i^\top x)\) is convex. These densities occur quite often, and efficiently sampling from them is a problem of great interest called Log-Concave Sampling. The first polynomial time algorithm for Cover’s universal portfolio was given by Kalai and Vempala using this technique. Their runtime is \(O(T^{14} n^4)\). Since then, there have been several improvements in log-concave sampling. I’m especially excited about a particular kind of MCMC called Mirror Langevin Monte Carlo, which is suitable for constrained sampling. As this is an active area of research, I do not know how well these methods works, but I plan to implement them in the future.

Regret

Cover showed that the regret of his algorithm on any sequence of returns is bounded by:

\[\sup_{[r]_1^T} \mathcal{R}_T(Cover, [r]_1^T) \leq (n-1)\log(T+1)\]

Moreover, we have a matching lower bound, showing that Cover’s algorithm is minimax regret optimal. The worst-case regret for any algorithm has to be at least:

\[\Omega(n\log T)\]

Experiments

I implemented the two-stock version of Cover’s algorithm in the following gist. I used the numpy.polynomial package for manipulating polynomials and evaluating their integrals.

We use a different benchmark here, Uniform Constantly Rebalanced Portfolios (UCRP). This strategy rebalances daily to the portfolio vector of equal weight \((0.5,0.5)\).

There are three different experiments on both synthetic and real stock data. In all cases, the y-axis is log-wealth.

Synthetic data

Two different stocks: I use log-normal returns with means (0.005,0.0) and variance 0.1.
Stock and Cash: I use log-normal returns with mean 0.005 and variance 0.1. Cash has a constant return of 1.
Stock and it’s inverse: I use log-normal returns with mean 0.0 and variance 0.1. If the stock has a percentage change \(x\) then the inverse has a percentage change \(-x\). The relation between \(x\) and \(r\) is given by \(x=(r-1)\times 100\). Let \(r'\) be the return of the inverse. So, \(r'\) satisfies \(-x = (r'-1)\times 100\). Substituting this, we see that \(r' = 2-r\). In this case, it is easy to see that UCRP has a return of 1.
Correlated Stocks: I use log-normal return with mean 0 and correlation coefficient \(\rho\).

Real data

As a form of risk control, I assume the portfolio is closed at the end of the day, and the return is close/open. So nothing is held overnight.

Two different stocks: I consider meme stocks AMC and GME.
Stock and Cash: I consider Bitcoin and cash.
Stock and it’s inverse: I consider TQQQ and SQQQ. Due to the way leveraged ETFs work, even UCRP loses money here.

References

Prediction, Learning, and Games has an entire chapter on Sequential Investment (Chapter 10). The regret proofs can be found here and in parts of Chapter 9.
Cover’s original paper Universal Portfolios also has the same regret proofs.
Theorem 7 in Logarithmic regret algorithms for online convex optimization by Elad Hazan, Amit Agarwal, and Satyen Kale also has concise proof, with slightly different terms.
Cover’s textbook, Elements of Information Theory has a chapter on log-optimal portfolio. See references here for Mean-Variance portfolio theory vs Log-Optimal portfolio theory.

Written on November 27, 2022