Online Portfolio Selection - Introduction

I started working on Online Portfolio Selection after reading an open problem published in COLT 2020 - Open Problem: Fast and Optimal Online Portfolio Selection. I spent the next two years reading about online portfolios and portfolio theory from a theoretical and practical standpoint. In a series of blogs, I intend to write about this problem. For me, this problem was a gateway to learning more about concepts in both online learning and portfolio optimization.

Problem Formulation

There are \(n\) stocks that trade in a market. We trade these stocks for a period of \(T\) days under the following protocol:

For day \(t=1 \dots T\):

  1. We decide on a portfolio vector \(x_t \in \Delta_n\) at the beginning of day \(t\) based on the history of returns \(r_1,\dots,r_{t-1}\).
  2. At the end of day \(t\), we observe the vector of returns \(r_t \in (0,\infty)^n\)
  3. Our wealth changes by a multiplicative factor of \(r_t^\top x_t\)

Here \(\Delta_n\) denotes the probability simplex on \(n\) points. Note that under this protocol, we rebalance at the beginning of every trading day. Also, we have not made any statistical assumptions about the return vector \(r_t\); they can be completely arbitrary.

If we start with an initial wealth of \(1\), the wealth after \(T\) days is \(\prod_{t=1}^T (r_t^\top x_t)\). Our goal is to pick \(x_1,\dots,x_T\) so as to maximize the log-wealth:

\[\log \left(\prod_{t=1}^T (r_t^\top x_t)\right) = \sum_{t=1}^T \log (r_t^\top x_t)\]

Benchmark: Optimality with Foresight

To quantify our strategy \(Alg\), of picking \(x_1,\dots,x_T\), we compare it with a strategy that knows the future and can compute the best single \(x^\star \in \Delta_n\) ahead of the first day.

\[x^\star = \arg\max_{x\in \Delta_n} \sum_{t=1}^T \log (r_t^\top x)\]

The Regret of our strategy compared to \(x^\star\) is:

\[\mathcal{R}_T(Alg, [r]_1^T) = \sum_{t=1}^T \log (r_t^\top x^\star) - \sum_{t=1}^T \log (r_t^\top x_t)\]

The strategy \(x^\star\) is called the Best Constantly Rebalanced Portfolio (BCRP). Regret is the difference between the log-wealth of BCRP and the log-wealth of our strategy. It measures how well our strategy performs compared to a strategy that knows the future.

Theoretical Standpoint:

From a theoretical point of view, we want a strategy that minimizes \(\mathcal{R}_T\) on any sequence of returns \(r_1,\dots,r_T\), i.e., we want to find \(Alg^\star\) that satisfies:

\[Alg^\star = \arg\min_{Alg} \max_{[r]_1^T} \mathcal{R}_T(Alg, [r]_1^T)\]

Moreover, for any given \(Alg\), we want to quantify \(\max_{[r]_1^T} \mathcal{R}_T(Alg, [r]_1^T)\) as a function of \(n\) and \(T\). If the following limit holds for an \(Alg\) :

\[\lim_{T\to \infty } \frac{1}{T}\max_{[r]_1^T} \mathcal{R}_T(Alg, [r]_1^T) = 0\]

Then this implies that asymptotically, the log-wealth of \(Alg\) approaches the log-wealth of BCRP on any sequence of returns \(r_1,\dots,r_T\). A strategy with this property is said to be Universal.

Besides seeking universality, we will care about finite time regret bounds, i.e., how does \(\mathcal{R}_T(Alg, [r]_1^T)\) grow as a function of \(n\) and \(T\).

Finally, as with any strategy, we also care about its time and space complexity.

Practical Standpoint:

From a practical point of view, there are a lot of facets to Online Portfolio Selection.

  • The model assumes that there are no transaction costs for trading. A proportional transaction cost model has addressed this.
  • The model is long-only and self-financing. However, it is possible to incorporate short-selling and margin as well.
  • The model assumes that it is possible to instantly buy and sell any number and even a fractional number of stocks.
  • We bound the regret on the worst-case sequence of returns. It does not use any statistical assumptions, making the model pessimistic.
  • The model assumes that it does not cause any market impact. This assumption is acceptable as long as our wealth is not large enough to move the market.
  • BCRP could be a weak benchmark. We could consider other notions of regret, such as adaptive, dynamic, or switching regret.
  • The algorithms do not directly incorporate risk measures like Sharpe-Ratio, Mean-Variance, or Maximum-Drawdown. There are impossibility results regarding this in the online learning literature. However, it is possible to compute these measures for any online algorithm.
  • The model does not use predictions of future returns when deciding on the current portfolio vector. It only uses historical returns. However, it is possible to incorporate future predictions using optimistic online learning.

I plan on writing about some of these points in future blogs. In the remaining post, I will present a straightforward algorithm and measure its performance on simulated and real market data.

Follow the Leader

At the beginning of day \(t\), we know \(r_1,\dots, r_{t-1}\). So, a simple strategy could be to pick \(x_t\) as:

\[x_t = \arg\max_{x\in \Delta_n} \sum_{i=1}^{t-1} \log(r_i^\top x)\]

FTL computes the BCRP on the returns seen so far. In the online learning literature, it is known that FTL has \(O(\log T)\) regret if we are working with strongly concave reward functions. However, this does not apply in our case as \(\log(r^\top x)\) is not strongly concave.

Experiments on Simulated and Real Market Data

References

The book Online Portfolio Selection: Principles and Algorithms has many online algorithms. Some of the algorithms in this book have theoretical guarantees, while others are heuristics guided by empirical phenomena. It is comprehensive and covers most work until 2018.

The latest state-of-the-art paper in this field is Efficient and Near-Optimal Online Portfolio Selection. It covers the relevant theoretical results from the last few years.

Future Posts

In future posts, I will discuss Cover’s Universal Portfolio, Exponentiated Gradient, Online Newton Step, Soft-Bayes, and Volumetric Barrier FTRL. I will also discuss my own algorithm for online portfolios that uses the log-barrier regularizer.

After that, I will cover topics like shorting/margin, transaction costs, other notions of regret, measures of risk, incorporating predictions, and taking advantage of empirical return patterns.

Written on November 24, 2022