Off-Policy Shaping Ensembles in Reinforcement Learning

05/21/2014
by   Anna Harutyunyan, et al.
0

Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency. This opens up new possibilities for sound ensemble techniques in reinforcement learning. In this work we propose learning an ensemble of policies related through potential-based shaping rewards. The ensemble induces a combination policy by using a voting mechanism on its components. Learning happens in real time, and we empirically show the combination policy to outperform the individual policies of the ensemble.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2015

Off-Policy Reward Shaping with Ensembles

Potential-based reward shaping (PBRS) is an effective and popular techni...
research
03/26/2023

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

Offline reinforcement learning agents seek optimal policies from fixed d...
research
09/26/2022

DEFT: Diverse Ensembles for Fast Transfer in Reinforcement Learning

Deep ensembles have been shown to extend the positive effect seen in typ...
research
02/26/2020

Policy Evaluation Networks

Many reinforcement learning algorithms use value functions to guide the ...
research
05/10/2021

Parameter-free Gradient Temporal Difference Learning

Reinforcement learning lies at the intersection of several challenges. M...
research
07/01/2022

Action-modulated midbrain dopamine activity arises from distributed control policies

Animal behavior is driven by multiple brain regions working in parallel ...
research
09/29/2022

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Reinforcement learning is a promising paradigm for learning robot contro...

Please sign up or login with your details

Forgot password? Click here to reset