STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

01/28/2023
by   Souradip Chakraborty, et al.
19

Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm STEERING: STEin information dirEcted exploration for model-based Reinforcement LearnING. To enable its derivation, we develop fundamentally new variants of KSD for discrete conditional distributions. We further establish that STEERING archives sublinear Bayesian regret, improving upon prior learning rates of information-augmented MBRL, IDS included. Experimentally, we show that the proposed algorithm is computationally affordable and outperforms several prior approaches.

READ FULL TEXT

page 3

page 7

page 8

page 9

page 26

page 29

page 30

research
06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...
research
06/02/2022

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

In this work, we propose a novel Kernelized Stein Discrepancy-based Post...
research
12/18/2018

Information-Directed Exploration for Deep Reinforcement Learning

Efficient exploration remains a major challenge for reinforcement learni...
research
11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...
research
10/08/2019

Receding Horizon Curiosity

Sample-efficient exploration is crucial not only for discovering rewardi...
research
02/13/2018

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling...
research
02/18/2021

A Bit Better? Quantifying Information for Bandit Learning

The information ratio offers an approach to assessing the efficacy with ...

Please sign up or login with your details

Forgot password? Click here to reset