Verifiable Planning in Expected Reward Multichain MDPs

by   George K. Atia, et al.

The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic, such as Linear Temporal Logic (LTL) or Computation Tree Logic (CTL), among others. While such logics are very powerful and expressive in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent. In this paper, we explore the steady-state planning problem of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.


page 10

page 26

page 29


LTL-Constrained Steady-State Policy Synthesis

Decision-making policies for agents are often synthesized with the const...

Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition

In hierarchical planning for Markov decision processes (MDPs), temporal ...

Interpretable Apprenticeship Learning with Temporal Logic Specifications

Recent work has addressed using formulas in linear temporal logic (LTL) ...

Controller Synthesis for Omega-Regular and Steady-State Specifications

Given a Markov decision process (MDP) and a linear-time (ω-regular or LT...

Optimal Aggregation Strategies for Social Learning over Graphs

Adaptive social learning is a useful tool for studying distributed decis...

Efficient Strategy Synthesis for MDPs with Resource Constraints

We consider qualitative strategy synthesis for the formalism called cons...

Strengthening Deterministic Policies for POMDPs

The synthesis problem for partially observable Markov decision processes...