A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

04/28/2021
by   Andrew Patterson, et al.
0

Many reinforcement learning algorithms rely on value estimation. However, the most widely used algorithms – namely temporal difference algorithms – can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation which are sound under linear function approximation, based on the linear mean-squared projected Bellman error (PBE). Extending these methods to the non-linear case has been largely unsuccessful. Recently, several methods have been introduced that approximate a different objective, called the mean-squared Bellman error (BE), which naturally facilities nonlinear approximation. In this work, we build on these insights and introduce a new generalized PBE, that extends the linear PBE to the nonlinear setting. We show how this generalized objective unifies previous work, including previous theory, and obtain new bounds for the value error of the solutions of the generalized objective. We derive an easy-to-use, but sound, algorithm to minimize the generalized objective which is more stable across runs, is less sensitive to hyperparameters, and performs favorably across four control domains with neural network function approximation.

READ FULL TEXT
research
05/17/2022

Robust Losses for Learning Value Functions

Most value function learning algorithms in reinforcement learning are ba...
research
07/01/2020

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learni...
research
04/20/2022

Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation

In this paper, we consider the policy evaluation problem in multi-agent ...
research
06/15/2017

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we d...
research
06/16/2021

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Recent development of Deep Reinforcement Learning has demonstrated super...
research
10/11/2019

Zap Q-Learning With Nonlinear Function Approximation

The Zap stochastic approximation (SA) algorithm was introduced recently ...
research
12/09/2020

Optimal oracle inequalities for solving projected fixed-point equations

Linear fixed point equations in Hilbert spaces arise in a variety of set...

Please sign up or login with your details

Forgot password? Click here to reset