Rethinking Expected Cumulative Reward Formalism of Reinforcement Learning: A Micro-Objective Perspective

05/24/2019
by   Changjian Li, et al.
0

The standard reinforcement learning (RL) formulation considers the expectation of the (discounted) cumulative reward. This is limiting in applications where we are concerned with not only the expected performance, but also the distribution of the performance. In this paper, we introduce micro-objective reinforcement learning --- an alternative RL formalism that overcomes this issue. In this new formulation, a RL task is specified by a set of micro-objectives, which are constructs that specify the desirability or undesirability of events. In addition, micro-objectives allow prior knowledge in the form of temporal abstraction to be incorporated into the global RL objective. The generality of this formalism, and its relations to single/multi-objective RL, and hierarchical RL are discussed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

A General Perspective on Objectives of Reinforcement Learning

In this lecture, we present a general perspective on reinforcement learn...
research
10/08/2020

Maximum Reward Formulation In Reinforcement Learning

Reinforcement learning (RL) algorithms typically deal with maximizing th...
research
09/07/2016

Unifying task specification in reinforcement learning

Reinforcement learning tasks are typically specified as Markov decision ...
research
06/16/2021

Mungojerrie: Reinforcement Learning of Linear-Time Objectives

Reinforcement learning synthesizes controllers without prior knowledge o...
research
06/15/2021

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Many advances that have improved the robustness and efficiency of deep r...
research
04/24/2017

Reinforcement Learning Based Dynamic Selection of Auxiliary Objectives with Preserving of the Best Found Solution

Efficiency of single-objective optimization can be improved by introduci...
research
04/08/2022

Multi-objective evolution for Generalizable Policy Gradient Algorithms

Performance, generalizability, and stability are three Reinforcement Lea...

Please sign up or login with your details

Forgot password? Click here to reset