Potential Field Guided Actor-Critic Reinforcement Learning

06/12/2020
by   Weiya Ren, et al.
0

In this paper, we consider the problem of actor-critic reinforcement learning. Firstly, we extend the actor-critic architecture to actor-critic-N architecture by introducing more critics beyond rewards. Secondly, we combine the reward-based critic with a potential-field-based critic to formulate the proposed potential field guided actor-critic reinforcement learning approach (actor-critic-2). This can be seen as a combination of the model-based gradients and the model-free gradients in policy improvement. State with large potential field often contains a strong prior information, such as pointing to the target at a long distance or avoiding collision by the side of an obstacle. In this situation, we should trust potential-field-based critic more as policy evaluation to accelerate policy improvement, where action policy tends to be guided. For example, in practical application, learning to avoid obstacles should be guided rather than learned by trial and error. State with small potential filed is often lack of information, for example, at the local minimum point or around the moving target. At this time, we should trust reward-based critic as policy evaluation more to evaluate the long-term return. In this case, action policy tends to explore. In addition, potential field evaluation can be combined with planning to estimate a better state value function. In this way, reward design can focus more on the final stage of reward, rather than reward shaping or phased reward. Furthermore, potential field evaluation can make up for the lack of communication in multi-agent cooperation problem, i.e., multi-agent each has a reward-based critic and a relative unified potential-field-based critic with prior information. Thirdly, simplified experiments on predator-prey game demonstrate the effectiveness of the proposed approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2019

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants...
research
05/29/2021

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

We posit a new mechanism for cooperation in multi-agent reinforcement le...
research
11/29/2019

Distributed Soft Actor-Critic with Multivariate Reward Representation and Knowledge Distillation

In this paper, we describe NeurIPS 2019 Learning to Move - Walk Around c...
research
06/17/2019

PACMAN: A Planner-Actor-Critic Architecture for Human-Centered Planning and Learning

Conventional reinforcement learning (RL) allows an agent to learn polici...
research
04/10/2019

Actor-Critic Instance Segmentation

Most approaches to visual scene analysis have emphasised parallel proces...
research
11/13/2020

Scaffolding Reflection in Reinforcement Learning Framework for Confinement Escape Problem

This paper formulates an application of reinforcement learning for an ev...
research
12/10/2022

Effects of Spectral Normalization in Multi-agent Reinforcement Learning

A reliable critic is central to on-policy actor-critic learning. But it ...

Please sign up or login with your details

Forgot password? Click here to reset