Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand

11/28/2022
by   Daniel Garces, et al.
0

We derive a learning framework to generate routing/pickup policies for a fleet of vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, considering a-priori unknown potential future requests, and 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in adapting to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) online play, a lookahead optimization method that improves the performance of rollout methods via an approximate policy iteration step, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in downtown San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms rollout-based reinforcement learning, as well as several benchmarks based on classical methods from the field of operations research.

READ FULL TEXT
research
07/05/2023

Surge Routing: Event-informed Multiagent Reinforcement Learning for Autonomous Rideshare

Large events such as conferences, concerts and sports games, often cause...
research
02/12/2018

A note on reinforcement learning with Wasserstein distance regularisation, with applications to multipolicy learning

In this note we describe an application of Wasserstein distance to Reinf...
research
02/28/2023

Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning

Autonomous Mobility-on-Demand (AMoD) systems are a rapidly evolving mode...
research
02/27/2019

Adaptive Caching via Deep Reinforcement Learning

Caching is envisioned to play a critical role in next-generation content...
research
12/25/2018

On-Demand Video Dispatch Networks: A Scalable End-to-End Learning Approach

We design a dispatch system to improve the peak service quality of video...
research
12/18/2019

Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms During High-Demand Hours

Rideshare platforms, when assigning requests to drivers, tend to maximiz...
research
11/20/2019

Neural Approximate Dynamic Programming for On-Demand Ride-Pooling

On-demand ride-pooling (e.g., UberPool) has recently become popular beca...

Please sign up or login with your details

Forgot password? Click here to reset