SVRG for Policy Evaluation with Fewer Gradient Evaluations

06/09/2019
by   Zilun Peng, et al.
0

Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes a full gradient over the whole dataset, which could lead to prohibitive computation costs. In this work, we show that two variants of SVRG for policy evaluation could significantly diminish the number of gradient calculations while preserving a linear convergence speed. More importantly, our theoretical result implies that one does not need to use the entire dataset in every epoch of SVRG when it is applied to policy evaluation with linear function approximation. Our experiments demonstrate large computational savings provided by the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2017

Stochastic Variance Reduction Methods for Policy Evaluation

Policy evaluation is a crucial step in many reinforcement-learning proce...
research
09/14/2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement...
research
04/09/2020

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcem...
research
01/08/2021

Average-Reward Off-Policy Policy Evaluation with Function Approximation

We consider off-policy policy evaluation with function approximation (FA...
research
11/29/2022

Closing the gap between SVRG and TD-SVRG with Gradient Splitting

Temporal difference (TD) learning is a simple algorithm for policy evalu...
research
01/31/2022

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

Bilevel optimization, the problem of minimizing a value function which i...
research
11/19/2014

Compress and Control

This paper describes a new information-theoretic policy evaluation techn...

Please sign up or login with your details

Forgot password? Click here to reset