Sample-Efficient Deep RL with Generative Adversarial Tree Search

06/15/2018
by   Kamyar Azizzadenesheli, et al.
4

We propose Generative Adversarial Tree Search (GATS), a sample-efficient Deep Reinforcement Learning (DRL) algorithm. While Monte Carlo Tree Search (MCTS) is known to be effective for search and planning in RL, it is often sample-inefficient and therefore expensive to apply in practice. In this work, we develop a Generative Adversarial Network (GAN) architecture to model an environment's dynamics and a predictor model for the reward function. We exploit collected data from interaction with the environment to learn these models, which we then use for model-based planning. During planning, we deploy a finite depth MCTS, using the learned model for tree search and a learned Q-value for the leaves, to find the best action. We theoretically show that GATS improves the bias-variance trade-off in value-based DRL. Moreover, we show that the generative model learns the model dynamics using orders of magnitude fewer samples than the Q-learner. In non-stationary settings where the environment model changes, we find the generative model adapts significantly faster than the Q-learner to the new environment.

READ FULL TEXT

page 8

page 14

page 15

research
12/27/2018

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

There are great interests as well as many challenges in applying reinfor...
research
12/27/2018

Neural Model-Based Reinforcement Learning for Recommendation

There are great interests as well as many challenges in applying reinfor...
research
02/25/2022

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Decision-making under uncertainty (DMU) is present in many important pro...
research
04/06/2020

Using Generative Adversarial Nets on Atari Games for Feature Extraction in Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) has been successfully applied in sever...
research
05/30/2017

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models

In unsupervised data generation tasks, besides the generation of a sampl...
research
05/22/2022

Should Models Be Accurate?

Model-based Reinforcement Learning (MBRL) holds promise for data-efficie...
research
04/15/2020

Bootstrapped model learning and error correction for planning with uncertainty in model-based RL

Having access to a forward model enables the use of planning algorithms ...

Please sign up or login with your details

Forgot password? Click here to reset