Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

04/12/2018
by   Dimitri P. Bertsekas, et al.
0

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. The optimal cost function of the aggregate problem, a nonlinear function of the features, serves as an architecture for approximation in value space of the optimal cost function or the cost functions of policies of the original problem. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with reinforcement learning based on deep neural networks, which is used to obtain the needed features. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by deep reinforcement learning, thereby potentially leading to more effective policy improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2019

Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

We propose a new aggregation framework for approximate dynamic programmi...
research
10/02/2004

Applying Policy Iteration for Training Recurrent Neural Networks

Recurrent neural networks are often used for learning time-series data. ...
research
10/15/2020

Optimal Dispatch in Emergency Service System via Reinforcement Learning

In the United States, medical responses by fire departments over the las...
research
05/24/2019

RL4health: Crowdsourcing Reinforcement Learning for Knee Replacement Pathway Optimization

Joint replacement is the most common inpatient surgical treatment in the...
research
02/01/2023

Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark

The olfactory search POMDP (partially observable Markov decision process...
research
06/26/2017

Learning Local Feature Aggregation Functions with Backpropagation

This paper introduces a family of local feature aggregation functions an...
research
11/06/2018

State Aggregation Learning from Markov Transition Data

State aggregation is a model reduction method rooted in control theory a...

Please sign up or login with your details

Forgot password? Click here to reset