H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch

05/05/2021
by   Benjamin Rivière, et al.
0

We present H-TD2: Hybrid Temporal Difference Learning for Taxi Dispatch, a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment to minimize expected customer waiting times. Our scalable algorithm exploits the natural transportation network company topology by switching between two behaviors: distributed temporal-difference learning computed locally at each taxi and infrequent centralized Bellman updates computed at the dispatch center. We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality; this advances the state of the art by enabling distributed operation with bounded-suboptimality. Additionally, unlike recent reinforcement learning dispatch methods, this policy estimation is adaptive and robust to out-of-training domain events. This result is enabled by a two-step modelling approach: the policy is learned on an agent-agnostic, cell-based Markov Decision Process and individual taxis are coordinated using the learned policy in a distributed game-theoretic task assignment. We validate our algorithm against a receding horizon control baseline in a Gridworld environment with a simulated customer dataset, where the proposed solution decreases average customer waiting time by 50 We also validate in a Chicago city environment with real customer requests from the Chicago taxi public dataset where the proposed solution decreases average customer waiting time by 26 2016 Major League Baseball World Series game.

READ FULL TEXT

page 1

page 3

page 8

research
12/29/2021

Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...
research
06/20/2020

Model-Free Robust Reinforcement Learning with Linear Function Approximation

This paper addresses the problem of model-free reinforcement learning fo...
research
09/27/2018

Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning

Initial DR studies mainly adopt model predictive control and thus requir...
research
12/20/2021

AGPNet – Autonomous Grading Policy Network

In this work, we establish heuristics and learning strategies for the au...
research
06/04/2022

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

The quintessential model-based reinforcement-learning agent iteratively ...
research
05/28/2023

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

Dynamic decision making under distributional shifts is of fundamental in...
research
05/21/2019

Mathematical method for calculating batch fragmentations and their impacts on product recall within a FIFO assignment policy

This study explores the interactions between order sizes, batch sizes an...

Please sign up or login with your details

Forgot password? Click here to reset