Optimizing Ambulance Service Coverage in NYC
Last Weekend, I participated in a Hackathon organized by Cubist and won first place! I got to work with some talented students from Yale, Cornell, and Columbia. Our project focused on enhancing ambulance staffing and stationing in New York City by analyzing geographic data of collisions and existing service locations/hospitals.
The hackathon’s objective was open-ended, challenging us to create something useful from datasets found on the NYC Open Data website. This platform contains intriguing datasets, such as the Tree Census and the Squirrel Census. We began by examining patterns in the geographic locations of accidents within the NYC collisions dataset and plotted these locations on a map. The map displays accidents from the past year where at least one person was injured or killed, and allows you to toggle between accidents, hospitals, and ambulance stations using the layers on the right side of the map.
After superimposing the accident data onto the locations of NYC hospitals and ambulance service stations, we noticed several gaps in service coverage. Consequently, we decided to address the problem of determining the optimal placement for a new ambulance station.
Typically, when an accident occurs, the nearest service center dispatches an ambulance to the location. The closest center is usually the one requiring the least travel time via road, which can be determined using the Google Maps API. However, we chose to approximate this by defining the nearest service center as the one with the smallest Haversine distance from the accident site. This led to the following clustering of accidents based on their nearest service centers:
You can toggle between accidents and service stations using the layers on the right side of the map for a better understanding of the clustering.
Algorithm for finding a new service station
In machine learning, clustering is a problem where the objective is to find cluster centers that minimize the total intra-cluster distance. The problem of choosing the location for single service center can be thought of a stylized clustering problem.
Let \(x_1,\dots,x_n\) be the accident locations and \(\mathcal{C} = \{c_1, \dots ,c_k\}\) be the service station locations. For each point, define \(c(x_i)\) as the closest service station according to the distance measure \(d\) (ties broken to favor lower indices).
\[c(x_i) = \arg\min \limits_{c \in \mathcal{C}} d(x_i,c)\]The total intra-cluster distance is:
\[\mathcal{D}(\mathcal{C}) = \sum_{i=1}^n d(x_i,c(x_i))\]The total intra-cluster distance represents the overall distance traveled by ambulances from their respective service stations to accident locations.
To find the optimal position for a new service station, consider introducing a new service center \(c\) somewhere on the map. Now, compute the cluster assignments:
\[c(x_i) = \arg\min \limits_{c \in \mathcal{C} \cup \{c\}} d(x_i,c)\]The total new intra-cluster distance is:
\[f(c)=\mathcal{D}(\mathcal{C} \cup \{c\}) = \sum_{i=1}^n d(x_i,c(x_i))\]The best location to put the new service station is:
\[c_{\text{new}} = \arg\min_{c \in \text{NYC}} f(c)\]Finding this point involves using gradient descent, or something similar. We came up with the following heuristic:
Let \(c\) be somewhere on the map. Let \(c^{N,\epsilon}, c^{E,\epsilon}, c^{S,\epsilon}, c^{W,\epsilon}\) be points that are a distance \(\epsilon\) away from \(c\) in the north, east, south, and west respectively. If \(f(c) \leq f(c^{D,\epsilon})\) for all directions \(D\), then \(c\) is a local minima. Otherwise, we have a direction of descent along which we can move. Here is the descent procedure:
- Initialize \(c\)
- While True:
- If \(f (c) < f(c^{D,\epsilon})\) for all \(D=\{N,E,S,W\}\):
- return \(c\)
- else:
- Update \(c = \min_{D}(c^{D,\epsilon})\)
- If \(f (c) < f(c^{D,\epsilon})\) for all \(D=\{N,E,S,W\}\):
This procedure would only find a local minimum for the objective. We must run this heuristic with \(c\) initialized at several locations and use the best one.
Adding a new center \(c\) on an already present service center \(c_i\) does not change the assignment (as they favor lower indices). So \(f(c_i) = \mathcal{D}(\mathcal{C})\) for all \(c_i \in \mathcal{C}\). Also, adding a center can only reduce the total intra-cluster distance, so \(f (c) \leq \mathcal{D}(\mathcal{C})\). Thus, \(c_i\) are global maxima for the objective function \(f\). Thus, we can initialize the descent heuristic at current service locations and find the best location for a new service center.
Here is the full algorithm:
- L=[]:
- For each \(c_i \in \mathcal{C}\):
- Initialize \(c\) at \(c_i\)
- While True:
- If \(f (c) < f(c^{D,\epsilon})\) for all \(D=\{N,E,S,W\}\):
- Store \((c, f(c))\) in list \(L\) and break
- else:
- Update \(c = \min_{D}(c^{D,\epsilon})\)
- If \(f (c) < f(c^{D,\epsilon})\) for all \(D=\{N,E,S,W\}\):
- In \(L\), find \(c\) which has the least \(f(c)\) are return it.
New Service Station Locations
The optimal locations for new service stations in each borough have been identified using our algorithm. In each case, the selected spots fill a “gap” in the service coverage on the map. For example, in Manhattan, the algorithm chooses Times Square as the ideal location for a new service station.
Manhattan
Brooklyn
Bronx
Queens
Staten Island
All of NYC
Concluding Remarks
In summary, the hackathon was a rewarding experience as we delved into a substantial dataset and devised an intriguing algorithm to address a problem with real-world implications. ChatGPT played a significant role in the event, proving invaluable in generating visualizations and querying the database. However, it required assistance to develop the algorithm itself.