Anti-Concentrated Confidence Bonuses for Scalable Exploration

10/21/2021
by   Jordan T. Ash, et al.
0

Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning. The LinUCB algorithm, a centerpiece of the stochastic linear bandits literature, prescribes an elliptical bonus which addresses the challenge of leveraging shared information in large action spaces. This bonus scheme cannot be directly transferred to high-dimensional exploration problems, however, due to the computational cost of maintaining the inverse covariance matrix of action features. We introduce anti-concentrated confidence bounds for efficiently approximating the elliptical bonus, using an ensemble of regressors trained to predict random noise from policy network-derived features. Using this approximation, we obtain stochastic linear bandit algorithms which obtain Õ(d √(T)) regret bounds for poly(d) fixed actions. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic reward heuristics on Atari benchmarks.

READ FULL TEXT

page 2

page 8

research
06/05/2017

UCB Exploration via Q-Ensembles

We show how an ensemble of Q^*-functions can be leveraged for more effec...
research
04/04/2018

Information Maximizing Exploration with a Latent Dynamics Model

All reinforcement learning algorithms must handle the trade-off between ...
research
02/14/2018

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

In continuous action domains, standard deep reinforcement learning algor...
research
12/18/2018

Information-Directed Exploration for Deep Reinforcement Learning

Efficient exploration remains a major challenge for reinforcement learni...
research
10/08/2020

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

Learning effective policies for sparse objectives is a key challenge in ...
research
10/01/2022

Deep Intrinsically Motivated Exploration in Continuous Control

In continuous control, exploration is often performed through undirected...
research
10/24/2022

Opportunistic Episodic Reinforcement Learning

In this paper, we propose and study opportunistic reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset