Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

10/06/2019
by   Dimitri Bertsekas, et al.
7

We propose a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a bias function V of the state, which biases the values of the aggregate cost function towards their correct levels. The classical aggregation framework is obtained when V≡0, but our scheme works best when V is a known reasonably good approximation to the optimal cost function J^*. When V is equal to the cost function J_μ of some known policy μ and there is only one aggregate state, our scheme is equivalent to the rollout algorithm based on μ (i.e., the result of a single policy improvement starting with the policy μ). When V=J_μ and there are multiple aggregate states, our aggregation approach can be used as a more powerful form of improvement of μ. Thus, when combined with an approximate policy evaluation scheme, our approach can form the basis for a new and enhanced form of approximate policy iteration. When V is a generic bias function, our scheme is equivalent to approximation in value space with lookahead function equal to V plus a local correction within each aggregate state. The local correction levels are obtained by solving a low-dimensional aggregate DP problem, yielding an arbitrarily close approximation to J^*, when the number of aggregate states is sufficiently large. Except for the bias function, the aggregate DP problem is similar to the one of the classical aggregation framework, and its algorithmic solution by simulation or other methods is nearly identical to one for classical aggregation, assuming values of V are available when needed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2018

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

In this paper we discuss policy iteration methods for approximate soluti...
research
05/14/2012

Approximate Modified Policy Iteration

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm ...
research
09/10/2019

Multi-Step Greedy and Approximate Real Time Dynamic Programming

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming...
research
03/23/2017

Unsupervised Basis Function Adaptation for Reinforcement Learning

When using reinforcement learning (RL) algorithms to evaluate a policy i...
research
05/25/2021

A Generalised Inverse Reinforcement Learning Framework

The gloabal objective of inverse Reinforcement Learning (IRL) is to esti...
research
07/22/2021

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

We introduce a contractive abstract dynamic programming framework and re...

Please sign up or login with your details

Forgot password? Click here to reset