Abuse ΘϜ Notation

Optimizing Ambulance Service Coverage in NYC

Last Weekend, I participated in a Hackathon organized by Cubist and won first place! I got to work with some talented students from Yale, Cornell, and Columbia. Our project focused on enhancing ambulance staffing and stationing in New York City by analyzing geographic data of collisions and existing service locations/hospitals.

Read More

Online Portfolio Selection - Online Newton Step

I presented the Smooth Selection(SS) algorithm in my previous blog. Assuming the stock returns are lower bounded by \(r\), the regret of SS is \(O(\frac{n}{r}\log(T))\), and the run time is \(O(n^{2.5} T)\) per iteration. In this blog, I present the Online Newton Step Algorithm that improves the runtime to \(n^{3.5}\).

Read More

Online Portfolio Selection - Smooth Prediction

I presented the Exponentiated Gradient(EG) algorithm in my previous blog. Assuming the stock returns are lower bounded by \(r\), the regret of EG is \(O(\frac{1}{r}\sqrt{T\log(n)})\) and the run time is \(O(n)\) per iteration. Using a standard trick for “universalizing”, we can remove the lower bound assumption and get a regret of \(O(n^{1/2}\log(n)^{1/4} T^{3/4})\). In this blog, I present the Smooth Prediction Algorithm that improves the dependence on \(T\) to \(O(\log T)\).

Read More

Online Portfolio Selection - Exponentiated Gradient

I presented Cover’s universal portfolio algorithm in my last blog. It has a regret bound of \(O(n \log T)\), where \(n\) is the number of stocks and \(T\) is the number of days traded. I implemented it for the case of two stocks. In the case of large \(n\), Cover’s algorithm can be run using a Markov Chain Monte Carlo simulation. The computation time for this scales as \(O(n^4 T^{14})\), which is quite prohibitive for even moderate \(n\) and \(T\). So, in this blog, I present a gradient descent-based algorithm that trades off computation time with regret bound while still staying universal.

Read More

Online Portfolio Selection - Introduction

I started working on Online Portfolio Selection after reading an open problem published in COLT 2020 - Open Problem: Fast and Optimal Online Portfolio Selection. I spent the next two years reading about online portfolios and portfolio theory from a theoretical and practical standpoint. In a series of blogs, I intend to write about this problem. For me, this problem was a gateway to learning more about concepts in both online learning and portfolio optimization.

Read More

Yet Another Derivation of Backpropagation in Matrix Form (this time using Adjoints)

Over 6 years ago, I wrote a blog post on the Derivation of Backpropagation in Matrix Form. My toolkit then included a basic understanding of the chain rule, linear algebra, and checking the dimension. And so I set about writing that post. With some questionable usage of the chain rule, I derived the backpropagation equations. At every step, I told myself - The dimensions check out, so it must be correct. That post became the most popular blog I had ever written. It still brings in a few hundred visitors each week.

Read More

2-Step Follow the Regularized Leader

We have seen 3 online learning algorithms: 1-step OMD, 2-step OMD and FTRL. Usually FTRL is written and studied as a single step update. However, it is possible to define and analyze a 2-step version of FTRL as well.

Read More

From Potentials to Probabilities

This post is about “Potential Functions” in the context of Online Learning. These form the basis for the Implicitly Normalized Forecaster(INF) studied in [1,2]. We present some basic results about potential functions here.

Read More

Linear Estimators

Consider the online linear optimization problem. At each round, the adversary chooses a loss function \(l_t \in \mathcal{F}\) and the player picks \(x_t \in \mathcal{K}\) from a convex set \(\mathcal{K}\). Roughly speaking, there are two basic settings in this problem:

Read More

Online Combinatorial Optimization using Hedge in Linear Time

In this post, we consider the Online Linear Optimization problem where the decision set is confined to the vertices of a Hypercube. Naively applying Hedge to this problem achieves optimal regret in \(T\) but requires maintaining an exponential number of variables. We show that it is in fact possible to apply Hedge by using only linear number of variables.

Read More

Hedge or: How to do well in an exam with 0 preparation?

Assume you are a grad student taking two courses this semester. One of which is Machine Learning Theory, which is your favorite course. You don’t like the other course so much, so you never study for it. You still have to do well in this course’s exam, so you decide to use your ML theory knowledge to get through it.

Read More

How to distinguish between two kinds of 7s?

There are two acceptable ways to write the digit ‘7’. It can be written with or without a line in the middle. Given an unlabelled dataset of 7s written in both ways, the task is to find which ones have a dash and which ones don’t.

Read More

Learning the Mode under Bandit Feedback

Consider the following problem. There is an unknown discrete probability distribution on the alphabet \([K]\). Let this distribution be \(p=[p_1,p_2,..,p_K]\). The mode of this distribution is \(k^\star = \arg \max_{k \in [K]} p_k\) and let \(p^\star = p_{k^\star}\). You have access to a noisy oracle. The goal is to query this oracle for \(T\) rounds such that the following regret is minimized.

Read More

Euclidean Distance Matrix Completion has No Spurious Local Minima?

Let \(X\) be a \(n\times k\) real matrix. It can be interpreted as \(n\) points, \(x_1,x_2,..x_n \in \mathbb{R}^k\). The Euclidean Distance Matrix(EDM) \(D\) is \([D]_{i,j} = \|x_i-x_j\|_2^2\). In the EDM completion problem, we are given some entries of \(D\) which are randomly sampled according to a probability \(p\). The goal is to determine the remaining entries of \(D\) and additionally the point matrix \(X\) producing \(D\). This is similar to the generic Low rank matrix completion problem, except on EDMs.

Read More

An Efron-Stein Like Lower bound for Variance

In a homework of the Machine Learning Theory course that I am taking at UMass, we were asked to prove the Efron-Stein Inequality. It upper bounds the variance of arbitrary functions of iid random variables. While working on the proof, I found an intriguing inequality which has the structure of Efron-Stein, but is a lower bound.

Read More

Perfect Matchings : Optimal Bike-pooling for a Common Destination

You are an intern at a prestigious AI lab. Assume there are \(n\) researchers in this lab. The lab is located at the origin of a metric space. The researchers live at various points in this space. Everyday, these researchers travel to the lab from their respective homes and back. Each researcher owns an eco-friendly motor bike which can carry at most 2 passengers. They decide to bike-pool to work such that the total distance traveled is minimized. They entrust you to come up with an algorithm which can find the optimal sharing scheme.

Read More

Posterior Sampling for RL

This post is about a simple yet very efficient Bayesian algorithm for minimizing cumulative regret in Episodic Fixed Horizon Markov Decision Processes(EFH-MDP). First introduced as a heuristic by Strens under the name of “Bayesian Dynamic Programming”, it was later shown by Osband et.al that it is in fact an efficient RL algorithm. They also gave it a more informative name - Posterior Sampling for Reinforcement Learning (PSRL).

Read More

Markov Decision Processes

This blog post is about Episodic Fixed Horizon Markov Decision Processes (EFH-MDP). MDPs are a very generic framework and can be used to model complex systems in manufacturing, control and robotics to state a few. These are also used to model environments in Reinforcement Learning (RL).

Read More

Multi Armed Bandits and Exploration Strategies

This blog post is about the Multi Armed Bandit(MAB) problem and about the Exploration-Exploitation dilemma faced in reinforcement learning. MABs find applications in areas such as advertising, drug trials, website optimization, packet routing and resource allocation.

Read More

A Derivation of Backpropagation in Matrix Form

Backpropagation is an algorithm used to train neural networks, used along with an optimization routine such as gradient descent. Gradient descent requires access to the gradient of the loss function with respect to all the weights in the network to perform a weight update, in order to minimize the loss function. Backpropagation computes these gradients in a systematic way. Backpropagation along with Gradient descent is arguably the single most important algorithm for training Deep Neural Networks and could be said to be the driving force behind the recent emergence of Deep Learning.

Read More

Formulating Linear Programs

Linear Programming(LP) is a widely used technique in algorithm design and analysis, especially for getting good approximations for NP hard problems. The first step in LP would be to formulate the the problem as an optimization problem with linear constraints where the quantity we are trying to optimize is also linear. It is a good excersize to formulate simple problems as linear programs. Linear programming problems typically expressed in their canonical form look like this:

Read More

Ratio of Heads and Tails

This is my first blog post. In this blog, I would be writing about Algorithms, Machine Learning, Optimization and Mathematics. I wanted to begin with something simple, so this post would be about a problem from Probability.

Read More