Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

03/07/2022
by   Alexander Long, et al.
0

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

READ FULL TEXT
research
05/31/2022

k-Means Maximum Entropy Exploration

Exploration in high-dimensional, continuous spaces with sparse rewards i...
research
05/31/2018

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

We propose Episodic Backward Update - a new algorithm to boost the perfo...
research
07/21/2020

On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

A basic simulation-based reinforcement learning algorithm is the Monte C...
research
09/18/2018

Switching Isotropic and Directional Exploration with Parameter Space Noise in Deep Reinforcement Learning

This paper proposes an exploration method for deep reinforcement learnin...
research
06/14/2017

Accelerated Reinforcement Learning Algorithms with Nonparametric Function Approximation for Opportunistic Spectrum Access

We study the problem of throughput maximization by predicting spectrum o...

Please sign up or login with your details

Forgot password? Click here to reset