Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

12/16/2021
by   Zhihai Wang, et al.
1

Model-based reinforcement learning algorithms, which aim to learn a model of the environment to make decisions, are more sample efficient than their model-free counterparts. The sample efficiency of model-based approaches relies on whether the model can well approximate the environment. However, learning an accurate model is challenging, especially in complex and noisy environments. To tackle this problem, we propose the conservative model-based actor-critic (CMBAC), a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models. Specifically, CMBAC learns multiple estimates of the Q-value function from a set of inaccurate models and uses the average of the bottom-k estimates – a conservative estimate – to optimize the policy. An appealing feature of CMBAC is that the conservative estimates effectively encourage the agent to avoid unreliable "promising actions" – whose values are high in only a small fraction of the models. Experiments demonstrate that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks, and the proposed method is more robust than previous methods in noisy environments.

READ FULL TEXT

page 5

page 6

page 11

page 12

research
11/28/2019

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Model-based reinforcement learning algorithms tend to achieve higher sam...
research
12/20/2021

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Deep reinforcement learning algorithms can perform poorly in real-world ...
research
12/14/2022

Efficient Exploration in Resource-Restricted Reinforcement Learning

In many real-world applications of reinforcement learning (RL), performi...
research
01/10/2023

Hint assisted reinforcement learning: an application in radio astronomy

Model based reinforcement learning has proven to be more sample efficien...
research
10/10/2020

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

It is a popular belief that model-based Reinforcement Learning (RL) is m...
research
03/08/2022

SO(2)-Equivariant Reinforcement Learning

Equivariant neural networks enforce symmetry within the structure of the...
research
06/30/2023

λ-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces

The idea of decision-aware model learning, that models should be accurat...

Please sign up or login with your details

Forgot password? Click here to reset