Log In Sign Up

Improving reinforcement learning algorithms: towards optimal learning rate policies

by   Othmane Mounjid, et al.

This paper investigates to what extent we can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate O(1/√(N)) is pessimistic and can be replaced by O((log(N)/N)^β) with 1/2≤β≤ 1 and N the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate (γ_k)_k≥ 0 used in stochastic algorithms. We decompose our policy into two interacting levels: the inner and the outer level. In the inner level, we present the PASS algorithm (for "PAst Sign Search") which, based on a predefined sequence (γ^o_k)_k≥ 0, constructs a new sequence (γ^i_k)_k≥ 0 whose error decreases faster. In the outer level, we propose an optimal methodology for the selection of the predefined sequence (γ^o_k)_k≥ 0. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in reinforcement learning (RL) in the three following applications: the estimation of a drift, the optimal placement of limit orders and the optimal execution of large number of shares.


page 1

page 2

page 3

page 4


Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

We study two time-scale linear stochastic approximation algorithms, whic...

Tutoring Reinforcement Learning via Feedback Control

We introduce a control-tutored reinforcement learning (CTRL) algorithm. ...

Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL

Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL a...

Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

Multi-objective reinforcement learning (MORL) is a relatively new field ...

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Despite the wide applications of Adam in reinforcement learning (RL), th...

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

In real-world applications of reinforcement learning (RL), noise from in...

Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic

This paper proposes SplitSGD, a new stochastic optimization algorithm wi...