A Kernel Loss for Solving the Bellman Equation

05/25/2019
by   Yihao Feng, et al.
0

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of Bellman operator that is not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods without risking divergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, on either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2015

Policy Gradient Methods for Off-policy Control

Off-policy learning refers to the problem of learning the value function...
research
03/21/2019

Towards Characterizing Divergence in Deep Q-Learning

Deep Q-Learning (DQL), a family of temporal difference algorithms for co...
research
08/03/2021

Variational Actor-Critic Algorithms

We introduce a class of variational actor-critic algorithms based on a v...
research
12/22/2016

Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

This paper investigates a type of instability that is linked to the gree...
research
02/24/2023

From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning

We study differentially private (DP) machine learning algorithms as inst...
research
08/30/2010

Fixed-point and coordinate descent algorithms for regularized kernel methods

In this paper, we study two general classes of optimization algorithms f...
research
01/31/2023

Toward Efficient Gradient-Based Value Estimation

Gradient-based methods for value estimation in reinforcement learning ha...

Please sign up or login with your details

Forgot password? Click here to reset