Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

07/14/2019
by   Harsh Gupta, et al.
0

We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC. We present finite-time performance bounds for the case where the learning rate is fixed. The key idea in obtaining these bounds is to use a Lyapunov function motivated by singular perturbation theory for linear differential equations. We use the bound to design an adaptive learning rate scheme which significantly improves the convergence rate over the known optimal polynomial decay rule in our experiments, and can be used to potentially improve the performance of any other schedule where the learning rate is changed at pre-determined time instants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2022

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

Learning rate is one of the most important hyper-parameters that has a s...
research
11/06/2019

Improving reinforcement learning algorithms: towards optimal learning rate policies

This paper investigates to what extent we can improve reinforcement lear...
research
04/27/2019

Forget the Learning Rate, Decay Loss

In the usual deep neural network optimization process, the learning rate...
research
04/29/2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

There is a stark disparity between the step size schedules used in pract...
research
02/24/2022

An optimal scheduled learning rate for a randomized Kaczmarz algorithm

We study how the learning rate affects the performance of a relaxed rand...
research
10/09/2019

On the adequacy of untuned warmup for adaptive optimization

Adaptive optimization algorithms such as Adam (Kingma Ba, 2014) are ...
research
11/03/2020

Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance

Two-time-scale stochastic approximation, a generalized version of the po...

Please sign up or login with your details

Forgot password? Click here to reset