A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

07/18/2016
by   S. A. Murphy, et al.
0

We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2023

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

The average reward criterion is relatively less studied as most existing...
research
05/29/2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Multi-step learning applies lookahead over multiple time steps and has p...
research
08/21/2022

Robust Tests in Online Decision-Making

Bandit algorithms are widely used in sequential decision problems to max...
research
10/21/2021

Actor-critic is implicitly biased towards high entropy optimal policies

We show that the simplest actor-critic method – a linear softmax policy ...
research
10/26/2018

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

In this paper, we present a new intrinsically motivated actor-critic alg...
research
08/16/2021

Optimal Actor-Critic Policy with Optimized Training Datasets

Actor-critic (AC) algorithms are known for their efficacy and high perfo...
research
03/11/2018

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive an optimal behavior that ac...

Please sign up or login with your details

Forgot password? Click here to reset