Smoothed functional-based gradient algorithms for off-policy reinforcement learning

01/06/2021
by   Nithia Vijayan, et al.
0

We consider the problem of control in an off-policy reinforcement learning (RL) context. We propose a policy gradient scheme that incorporates a smoothed functional-based gradient estimation scheme. We provide an asymptotic convergence guarantee for the proposed algorithm using the ordinary differential equation (ODE) approach. Further, we derive a non-asymptotic bound that quantifies the rate of convergence of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2021

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis

We propose policy-gradient algorithms for solving the problem of control...
research
02/22/2021

Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

Action-constrained reinforcement learning (RL) is a widely-used approach...
research
03/10/2021

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

We analyze the DQN reinforcement learning algorithm as a stochastic appr...
research
07/20/2020

A Short Note on Soft-max and Policy Gradients in Bandits Problems

This is a short communication on a Lyapunov function argument for softma...
research
05/24/2023

Policy Learning based on Deep Koopman Representation

This paper proposes a policy learning algorithm based on the Koopman ope...
research
02/23/2021

Mixed Policy Gradient

Reinforcement learning (RL) has great potential in sequential decision-m...
research
02/11/2019

An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms

One of the popular measures of central tendency that provides better rep...

Please sign up or login with your details

Forgot password? Click here to reset