Stochastic Variance Reduction Methods for Policy Evaluation

02/25/2017
by   Simon S. Du, et al.
0

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2018

Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity

We consider the convex-concave saddle point problem _x_y f(x)+y^ A x-g(y...
research
06/09/2019

SVRG for Policy Evaluation with Fewer Gradient Evaluations

Stochastic variance-reduced gradient (SVRG) is an optimization method or...
research
06/25/2019

Expected Sarsa(λ) with Control Variate for Variance Reduction

Off-policy learning is powerful for reinforcement learning. However, the...
research
01/31/2022

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Bilevel optimization, the problem of minimizing a value function which i...
research
10/17/2021

Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules

We thank the opportunity offered by editors for this discussion and the ...
research
08/23/2020

Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning

Temporal-Difference (TD) learning with nonlinear smooth function approxi...
research
10/20/2022

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for p...

Please sign up or login with your details

Forgot password? Click here to reset