Predictive State Temporal Difference Learning

10/30/2010
by   Byron Boots, et al.
0

We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Provably Efficient Reinforcement Learning with General Value Function Approximation

Value function approximation has demonstrated phenomenal empirical succe...
research
10/22/2019

State2vec: Off-Policy Successor Features Approximators

A major challenge in reinforcement learning (RL) is the design of agents...
research
06/17/2021

Adapting the Function Approximation Architecture in Online Reinforcement Learning

The performance of a reinforcement learning (RL) system depends on the c...
research
05/28/2019

Conditions on Features for Temporal Difference-Like Methods to Converge

The convergence of many reinforcement learning (RL) algorithms with line...
research
07/06/2021

A Unified Off-Policy Evaluation Approach for General Value Function

General Value Function (GVF) is a powerful tool to represent both the pr...
research
02/05/2019

Separating value functions across time-scales

In many finite horizon episodic reinforcement learning (RL) settings, it...
research
04/15/2021

Predictor-Corrector(PC) Temporal Difference(TD) Learning (PCTD)

Using insight from numerical approximation of ODEs and the problem formu...

Please sign up or login with your details

Forgot password? Click here to reset