Planning with Expectation Models for Control

04/17/2021
by   Katya Kudashkina, et al.
0

In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as when it is learned using function approximation. In these cases a full distribution model may be impractical and a sample model may be either more expensive computationally or of high variance. Wan et al. considered only planning for prediction to evaluate a fixed policy. In this paper, we treat the control case - planning to improve and find a good approximate policy. We prove that planning with an expectation model must update a state-value function, not an action-value function as previously suggested (e.g., Sorg Singh, 2010). This opens the question of how planning influences action selections. We consider three strategies for this and present general MBRL algorithms for each. We identify the strengths and weaknesses of these algorithms in computational experiments. Our algorithms and experiments are the first to treat MBRL with expectation models in a general setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Planning with Expectation Models

Distribution and sample models are two popular model choices in model-ba...
research
01/23/2023

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

A common technique in reinforcement learning is to evaluate the value fu...
research
10/25/2021

Self-Consistent Models and Values

Learned models of the environment provide reinforcement learning (RL) ag...
research
12/05/2018

Relative Entropy Regularized Policy Iteration

We present an off-policy actor-critic algorithm for Reinforcement Learni...
research
01/12/2022

Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees

In this work we present a preliminary investigation of a novel algorithm...
research
02/14/2020

Frequency-based Search-control in Dyna

Model-based reinforcement learning has been empirically demonstrated as ...
research
11/25/2018

Planning in Dynamic Environments with Conditional Autoregressive Models

We demonstrate the use of conditional autoregressive generative models (...

Please sign up or login with your details

Forgot password? Click here to reset