Manifold Regularization for Kernelized LSTD

10/15/2017
by   Xinyan Yan, et al.
0

Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2023

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

We consider the problem of control in the setting of reinforcement learn...
research
01/31/2012

Learning RoboCup-Keepaway with Kernels

We apply kernel-based methods to solve the difficult reinforcement learn...
research
07/22/2020

Approximation Benefits of Policy Gradient Methods with Aggregated States

Folklore suggests that policy gradient can be more robust to misspecific...
research
01/08/2020

A Nonparametric Offpolicy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample com...
research
01/31/2012

Feature Selection for Value Function Approximation Using Bayesian Model Selection

Feature selection in reinforcement learning (RL), i.e. choosing basis fu...
research
07/06/2021

A Unified Off-Policy Evaluation Approach for General Value Function

General Value Function (GVF) is a powerful tool to represent both the pr...
research
02/17/2020

Kalman meets Bellman: Improving Policy Evaluation through Value Tracking

Policy evaluation is a key process in Reinforcement Learning (RL). It as...

Please sign up or login with your details

Forgot password? Click here to reset