On Optimism in Model-Based Reinforcement Learning

06/21/2020
by   Aldo Pacchiano, et al.
0

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: Õ( |S|H√(|S||A| T ) ) when augmenting using Gaussian noise, where T is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2020

Minimax Optimal Reinforcement Learning for Discounted MDPs

We study the reinforcement learning problem for discounted Markov Decisi...
research
01/13/2022

Automated Reinforcement Learning: An Overview

Reinforcement Learning and recently Deep Reinforcement Learning are popu...
research
07/13/2022

Hindsight Learning for MDPs with Exogenous Inputs

We develop a reinforcement learning (RL) framework for applications that...
research
07/07/2020

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

Deep model-based Reinforcement Learning (RL) has the potential to substa...
research
09/13/2020

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...
research
06/15/2023

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Thompson sampling (TS) is widely used in sequential decision making due ...
research
11/08/2022

Reinforcement Learning with Stepwise Fairness Constraints

AI methods are used in societally important settings, ranging from credi...

Please sign up or login with your details

Forgot password? Click here to reset