Verifiable Planning in Expected Reward Multichain MDPs

12/03/2020
by   George K. Atia, et al.
0

The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic, such as Linear Temporal Logic (LTL) or Computation Tree Logic (CTL), among others. While such logics are very powerful and expressive in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent. In this paper, we explore the steady-state planning problem of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.

READ FULL TEXT

page 10

page 26

page 29

research
05/31/2021

LTL-Constrained Steady-State Policy Synthesis

Decision-making policies for agents are often synthesized with the const...
research
10/05/2018

Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition

In hierarchical planning for Markov decision processes (MDPs), temporal ...
research
10/28/2017

Interpretable Apprenticeship Learning with Temporal Logic Specifications

Recent work has addressed using formulas in linear temporal logic (LTL) ...
research
06/05/2021

Controller Synthesis for Omega-Regular and Steady-State Specifications

Given a Markov decision process (MDP) and a linear-time (ω-regular or LT...
research
09/09/2021

Risk-Averse Decision Making Under Uncertainty

A large class of decision making under uncertainty problems can be descr...
research
07/16/2020

Strengthening Deterministic Policies for POMDPs

The synthesis problem for partially observable Markov decision processes...
research
03/14/2022

Optimal Aggregation Strategies for Social Learning over Graphs

Adaptive social learning is a useful tool for studying distributed decis...

Please sign up or login with your details

Forgot password? Click here to reset