Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

05/20/2023
by   Naman Saxena, et al.
0

The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an ϵ-optimal stationary policy with a sample complexity of Ω(ϵ^-2.5). We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Actor-critic (AC) methods have exhibited great empirical success compare...
research
07/18/2016

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

We develop an off-policy actor-critic algorithm for learning an optimal ...
research
04/25/2023

Fulfilling Formal Specifications ASAP by Model-free Reinforcement Learning

We propose a model-free reinforcement learning solution, namely the ASAP...
research
07/20/2019

Potential-Based Advice for Stochastic Policy Learning

This paper augments the reward received by a reinforcement learning agen...
research
09/09/2019

Transfer Reward Learning for Policy Gradient-Based Text Generation

Task-specific scores are often used to optimize for and evaluate the per...
research
11/04/2021

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate o...
research
10/26/2018

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

In this paper, we present a new intrinsically motivated actor-critic alg...

Please sign up or login with your details

Forgot password? Click here to reset