Average-reward model-free reinforcement learning: a systematic review and literature mapping

10/18/2020
by   Vektor Dewanto, et al.
14

Model-free reinforcement learning (RL) has been an active area of research and provides a fundamental framework for agent-based learning and decision-making in artificial intelligence. In this paper, we review a specific subset of this literature, namely work that utilizes optimization criteria based on average rewards, in the infinite horizon setting. Average reward RL has the advantage of being the most selective criterion in recurrent (ergodic) Markov decision processes. In comparison to widely-used discounted reward criterion, it also requires no discount factor, which is a critical hyperparameter, and properly aligns the optimization and performance metrics. Motivated by the solo survey by Mahadevan (1996a), we provide an updated review of work in this area and extend it to cover policy-iteration and function approximation methods (in addition to the value-iteration and tabular counterparts). We also identify and discuss opportunities for future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2023

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

We develop several provably efficient model-free reinforcement learning ...
research
10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...
research
07/03/2021

Examining average and discounted reward optimality criteria in reinforcement learning

In reinforcement learning (RL), the goal is to obtain an optimal policy,...
research
05/17/2023

Model-Free Robust Average-Reward Reinforcement Learning

Robust Markov decision processes (MDPs) address the challenge of model u...
research
02/27/2023

Reinforcement Learning with Depreciating Assets

A basic assumption of traditional reinforcement learning is that the val...
research
03/16/2023

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

Continuous-time Markov decision processes (CTMDPs) are canonical models ...
research
11/02/2020

Optimal Policies for the Homogeneous Selective Labels Problem

Selective labels are a common feature of consequential decision-making a...

Please sign up or login with your details

Forgot password? Click here to reset