Learning and Planning in Complex Action Spaces

04/13/2021
by   Thomas Hubert, et al.
2

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2020

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can ...
research
05/10/2019

Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces

Parameterised actions in reinforcement learning are composed of discrete...
research
05/09/2012

Seeing the Forest Despite the Trees: Large Scale Spatial-Temporal Decision Making

We introduce a challenging real-world planning problem where actions mus...
research
06/10/2020

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

Sample-based planning is a powerful family of algorithms for generating ...
research
04/03/2019

PaintBot: A Reinforcement Learning Approach for Natural Media Painting

We propose a new automated digital painting framework, based on a painti...
research
10/07/2021

Design Strategy Network: A deep hierarchical framework to represent generative design strategies in complex action spaces

Generative design problems often encompass complex action spaces that ma...
research
07/13/2020

DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional Action Space

It has been arduous to assess the progress of a policy learning algorith...

Please sign up or login with your details

Forgot password? Click here to reset