Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

04/20/2015
by   Reinaldo Uribe Muriel, et al.
0

This paper describes a novel method to solve average-reward semi-Markov decision processes, by reducing them to a minimal sequence of cumulative reward problems. The usual solution methods for this type of problems update the gain (optimal average reward) immediately after observing the result of taking an action. The alternative introduced, optimal nudging, relies instead on setting the gain to some fixed value, which transitorily makes the problem a cumulative-reward task, solving it by any standard reinforcement learning method, and only then updating the gain in a way that minimizes uncertainty in a minmax sense. The rule for optimal gain update is derived by exploiting the geometric features of the w-l space, a simple mapping of the space of policies. The total number of cumulative reward tasks that need to be solved is shown to be small. Some experiments are presented to explore the features of the algorithm and to compare its performance with other approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Quantile Markov Decision Process

In this paper, we consider the problem of optimizing the quantiles of th...
research
06/03/2021

A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes

This paper presents the first model-free, simulator-free reinforcement l...
research
02/27/2020

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes (MD...
research
04/29/2011

Mean-Variance Optimization in Markov Decision Processes

We consider finite horizon Markov decision processes under performance m...
research
04/07/2023

Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

We extend the provably convergent Full Gradient DQN algorithm for discou...
research
04/17/2002

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

The problem of making sequential decisions in unknown probabilistic envi...
research
10/04/2021

A Markov process approach to untangling intention versus execution in tennis

Value functions are used in sports applications to determine the optimal...

Please sign up or login with your details

Forgot password? Click here to reset