Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

01/29/2019
by   Thomas J. Ringstrom, et al.
0

Problems arise when using reward functions to capture dependencies between sequential time-constrained goal states because the state-space must be prohibitively expanded to accommodate a history of successfully achieved sub-goals. Policies and value functions derived with stationarity assumptions are not readily decomposable, leading to a tension between reward maximization and task generalization. We demonstrate a logic-compatible approach using model-based knowledge of environment dynamics and deadline information to directly infer non-stationary policies composed of reusable stationary policies. The policies are constructed to maximize the probability of satisfying time-sensitive goals while respecting time-varying obstacles. Our approach explicitly maintains two different spaces, a high-level logical task specification where the task-variables are grounded onto the low-level state-space of a Markov decision process. Computing satisfiability at the task-level is made possible by a Bellman-like equation which operates on a tensor that links the temporal relationship between the two spaces; the equation solves for a value function that can be explicitly interpreted as the probability of sub-goal satisfaction under the synthesized non-stationary policy, an approach we term Constraint Satisfaction Propagation (CSP).

READ FULL TEXT

page 2

page 7

page 8

page 10

research
11/29/2012

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

We consider infinite-horizon stationary γ-discounted Markov Decision Pro...
research
01/28/2021

Acting in Delayed Environments with Non-Stationary Markov Policies

The standard Markov Decision Process (MDP) formulation hinges on the ass...
research
03/25/2012

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

We consider infinite-horizon γ-discounted Markov Decision Processes, for...
research
07/22/2020

Secure Control in Partially Observable Environments to Satisfy LTL Specifications

This paper studies the synthesis of control policies for an agent that h...
research
02/19/2021

Probabilistically Guaranteed Satisfaction of Temporal Logic Constraints During Reinforcement Learning

We present a novel reinforcement learning algorithm for finding optimal ...
research
07/29/2023

Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows

We propose an automata-theoretic approach for reinforcement learning (RL...
research
01/21/2023

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

We study the problem of learning goal-conditioned policies in Minecraft,...

Please sign up or login with your details

Forgot password? Click here to reset