Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

09/10/2016
by   Nicolas Usunier, et al.
0

We consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members during a battle. From a reinforcement learning point of view, these scenarios are challenging because the state-action space is very large, and because there is no obvious feature representation for the state-action evaluation function. We describe our approach to tackle the micromanagement scenarios with deep neural network controllers from raw state features given by the game engine. In addition, we present a heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation. This algorithm allows for the collection of traces for learning using deterministic policies, which appears much more efficient than, for example, ϵ-greedy exploration. Experiments show that with this algorithm, we successfully learn non-trivial strategies for scenarios with armies of up to 15 agents, where both Q-learning and REINFORCE struggle.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2018

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

Reinforcement learning has shown great potential in generalizing over ra...
research
06/01/2018

Being curious about the answers to questions: novelty search with learned attention

We investigate the use of attentional neural network layers in order to ...
research
01/06/2016

Angrier Birds: Bayesian reinforcement learning

We train a reinforcement learner to play a simplified version of the gam...
research
03/13/2020

Deep Deterministic Portfolio Optimization

Can deep reinforcement learning algorithms be exploited as solvers for o...
research
05/26/2020

Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

In Markov games, playing against non-stationary opponents with learning ...
research
05/28/2023

On the Value of Myopic Behavior in Policy Reuse

Leveraging learned strategies in unfamiliar scenarios is fundamental to ...

Please sign up or login with your details

Forgot password? Click here to reset