Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning

07/24/2019
by   Bilal Kartal, et al.
0

Deep reinforcement learning has achieved great successes in recent years, but there are still open challenges, such as convergence to locally optimal policies and sample inefficiency. In this paper, we contribute a novel self-supervised auxiliary task, i.e., Terminal Prediction (TP), estimating temporal closeness to terminal states for episodic tasks. The intuition is to help representation learning by letting the agent predict how close it is to a terminal state, while learning its control policy. Although TP could be integrated with multiple algorithms, this paper focuses on Asynchronous Advantage Actor-Critic (A3C) and demonstrating the advantages of A3C-TP. Our extensive evaluation includes: a set of Atari games, the BipedalWalker domain, and a mini version of the recently proposed multi-agent Pommerman game. Our results on Atari games and the BipedalWalker domain suggest that A3C-TP outperforms standard A3C in most of the tested domains and in others it has similar performance. In Pommerman, our proposed method provides significant improvement both in learning efficiency and converging to better policies against different opponents.

READ FULL TEXT
research
07/22/2019

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning

In this paper we explore how actor-critic methods in deep reinforcement ...
research
11/30/2018

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL

Deep reinforcement learning (DRL) has achieved great successes in recent...
research
12/22/2021

Alpha-Mini: Minichess Agent with Deep Reinforcement Learning

We train an agent to compete in the game of Gardner minichess, a downsiz...
research
03/07/2019

MinAtar: An Atari-inspired Testbed for More Efficient Reinforcement Learning Experiments

The Arcade Learning Environment (ALE) is a popular platform for evaluati...
research
08/25/2020

Auxiliary-task Based Deep Reinforcement Learning for Participant Selection Problem in Mobile Crowdsourcing

In mobile crowdsourcing (MCS), the platform selects participants to comp...
research
07/25/2019

Action Guidance with MCTS for Deep Reinforcement Learning

Deep reinforcement learning has achieved great successes in recent years...
research
04/01/2022

What makes useful auxiliary tasks in reinforcement learning: investigating the effect of the target policy

Auxiliary tasks have been argued to be useful for representation learnin...

Please sign up or login with your details

Forgot password? Click here to reset