Concurrent Credit Assignment for Data-efficient Reinforcement Learning

05/24/2022
by   Emmanuel Daucé, et al.
0

The capability to widely sample the state and action spaces is a key ingredient toward building effective reinforcement learning algorithms. The variational optimization principles exposed in this paper emphasize the importance of an occupancy model to synthesizes the general distribution of the agent's environmental states over which it can act (defining a virtual “territory”). The occupancy model is the subject of frequent updates as the exploration progresses and that new states are undisclosed during the course of the training. By making a uniform prior assumption, the resulting objective expresses a balance between two concurrent tendencies, namely the widening of the occupancy space and the maximization of the rewards, reminding of the classical exploration/exploitation trade-off. Implemented on an actor-critic off-policy on classic continuous action benchmarks, it is shown to provide significant increase in the sampling efficacy, that is reflected in a reduced training time and higher returns, in both the dense and the sparse rewards cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Learning Guidance Rewards with Trajectory-space Smoothing

Long-term temporal credit assignment is an important challenge in deep r...
research
04/09/2021

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-poli...
research
10/01/2022

Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

Actor-critic (AC) algorithms are a class of model-free deep reinforcemen...
research
02/20/2017

Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning

Reinforcement Learning algorithms can learn complex behavioral patterns ...
research
10/23/2018

Efficient Eligibility Traces for Deep Reinforcement Learning

Eligibility traces are an effective technique to accelerate reinforcemen...
research
01/18/2020

Effects of sparse rewards of different magnitudes in the speed of learning of model-based actor critic methods

Actor critic methods with sparse rewards in model-based deep reinforceme...
research
05/06/2019

DeepRMSA: A Deep Reinforcement Learning Framework for Routing, Modulation and Spectrum Assignment in Elastic Optical Networks

This paper proposes DeepRMSA, a deep reinforcement learning framework fo...

Please sign up or login with your details

Forgot password? Click here to reset