No-Regret Reinforcement Learning with Heavy-Tailed Rewards

02/25/2021
by   Vincent Zhuang, et al.
0

Reinforcement learning algorithms typically assume rewards to be sampled from light-tailed distributions, such as Gaussian or bounded. However, a wide variety of real-world systems generate rewards that follow heavy-tailed distributions. We consider such scenarios in the setting of undiscounted reinforcement learning. By constructing a lower bound, we show that the difficulty of learning heavy-tailed rewards asymptotically dominates the difficulty of learning transition probabilities. Leveraging techniques from robust mean estimation, we propose Heavy-UCRL2 and Heavy-Q-Learning, and show that they achieve near-optimal regret bounds in this setting. Our algorithms also naturally generalize to deep reinforcement learning applications; we instantiate Heavy-DQN as an example of this. We demonstrate that all of our algorithms outperform baselines on both synthetic MDPs and standard RL benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards

In this paper, we study the problem of (finite horizon tabular) Markov d...
research
06/20/2023

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

In a broad class of reinforcement learning applications, stochastic rewa...
research
06/12/2023

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

While numerous works have focused on devising efficient algorithms for r...
research
02/06/2023

On Private and Robust Bandits

We study private and robust multi-armed bandits (MABs), where the agent ...
research
07/31/2020

Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization

We study the problem of estimating the mean of a distribution in high di...
research
05/24/2021

Robust learning with anytime-guaranteed feedback

Under data distributions which may be heavy-tailed, many stochastic grad...
research
02/13/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

We study episodic reinforcement learning under unknown adversarial corru...

Please sign up or login with your details

Forgot password? Click here to reset