Approximate information state for approximate planning and reinforcement learning in partially observed systems

10/17/2020
by   Jayakumar Subramanian, et al.
0

We propose a theoretical framework for approximate planning and learning in partially observed systems. Our framework is based on the fundamental notion of information state. We provide two equivalent definitions of information state—i) a function of history which is sufficient to compute the expected reward and predict its next value; ii) equivalently, a function of the history which can be recursively updated and is sufficient to compute the expected reward and predict the next observation. An information state always leads to a dynamic programming decomposition. Our key result is to show that if a function of the history (called approximate information state (AIS)) approximately satisfies the properties of the information state, then there is a corresponding approximate dynamic program. We show that the policy computed using this is approximately optimal with bounded loss of optimality. We show that several approximations in state, observation and action spaces in literature can be viewed as instances of AIS. In some of these cases, we obtain tighter bounds. A salient feature of AIS is that it can be learnt from data. We present AIS based multi-time scale policy gradient algorithms. and detailed numerical experiments with low, moderate and high dimensional environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2023

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Many policy-based reinforcement learning (RL) algorithms can be viewed a...
research
10/30/2017

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value it...
research
05/26/2022

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency

Reinforcement learning in partially observed Markov decision processes (...
research
07/22/2020

Approximation Benefits of Policy Gradient Methods with Aggregated States

Folklore suggests that policy gradient can be more robust to misspecific...
research
10/25/2021

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

Due to information asymmetry, finding optimal policies for Decentralized...
research
11/28/2001

Gradient-based Reinforcement Planning in Policy-Search Methods

We introduce a learning method called "gradient-based reinforcement plan...
research
01/12/2023

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems

In this paper, we investigate discrete-time decision-making problems in ...

Please sign up or login with your details

Forgot password? Click here to reset