Toward Efficient Gradient-Based Value Estimation

01/31/2023
by   Arsalan Sharifnassab, et al.
0

Gradient-based methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an ill-conditioned loss function in the sense that its Hessian has large condition-number. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2019

Improving Neural Network Classifier using Gradient-based Floating Centroid Method

Floating centroid method (FCM) offers an efficient way to solve a fixed-...
research
10/10/2022

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Gradient-based methods have been widely used for system design and optim...
research
08/15/2021

Policy Evaluation and Temporal-Difference Learning in Continuous Time and Space: A Martingale Approach

We propose a unified framework to study policy evaluation (PE) and the a...
research
10/04/2015

Implicit stochastic approximation

The need to carry out parameter estimation from massive data has reinvig...
research
05/10/2021

Parameter-free Gradient Temporal Difference Learning

Reinforcement learning lies at the intersection of several challenges. M...
research
01/06/2022

Well-Conditioned Linear Minimum Mean Square Error Estimation

Linear minimum mean square error (LMMSE) estimation is often ill-conditi...
research
05/25/2019

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art re...

Please sign up or login with your details

Forgot password? Click here to reset