Corruption Robust Exploration in Episodic Reinforcement Learning

11/20/2019
by   Thodoris Lykouris, et al.
12

We initiate the study of multi-stage episodic reinforcement learning under adversarial manipulations in both the rewards and the transition probabilities of the underlying system. Existing efficient algorithms heavily rely on the "optimism under uncertainty" principle which dictates their behavior and does not allow flexibility to perform corruption-robust exploration. We address this by (i) departing from the optimistic behavior, and (ii) creating a general framework that incorporates the principle of action-elimination. (This principle has been essential for corruption-robust exploration in multi-armed bandits, a degenerate special case of episodic reinforcement learning.) Despite constructing a lower bound for a straightforward implementation of action-elimination, we provide a clean and modular way to transfer it to episodic reinforcement learning. Our algorithm enjoys near-optimal guarantees in the absence of adversarial manipulations, has performance that degrades gracefully as the amount of corruption increases, and does not need to know this amount. Our results shed new light on the broader question of robust exploration, and suggest a way to address a rather daunting mismatch between optimistic algorithms and algorithms with higher flexibility. To demonstrate the applicability of our framework, we provide a second instantiation thereof, showing how it can provide efficient guarantees for the stochastic setting, despite doing almost uniform exploration across plausibly optimal actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

We study episodic reinforcement learning under unknown adversarial corru...
research
05/29/2017

Boltzmann Exploration Done Right

Boltzmann exploration is a classic strategy for sequential decision-maki...
research
02/08/2016

PAC Reinforcement Learning with Rich Observations

We propose and study a new model for reinforcement learning with rich ob...
research
05/22/2023

Distributionally Robust Optimization Efficiently Solves Offline Reinforcement Learning

Offline reinforcement learning aims to find the optimal policy from a pr...
research
07/07/2020

Stochastic Linear Bandits Robust to Adversarial Attacks

We consider a stochastic linear bandit problem in which the rewards are ...
research
07/15/2020

Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits

The principle of optimism in the face of uncertainty is one of the most ...
research
02/05/2021

Provably Efficient Algorithms for Multi-Objective Competitive RL

We study multi-objective reinforcement learning (RL) where an agent's re...

Please sign up or login with your details

Forgot password? Click here to reset