Probabilistically Guaranteed Satisfaction of Temporal Logic Constraints During Reinforcement Learning

02/19/2021
by   Derya Aksaray, et al.
0

We present a novel reinforcement learning algorithm for finding optimal policies in Markov Decision Processes while satisfying temporal logic constraints with a desired probability throughout the learning process. An automata-theoretic approach is proposed to ensure probabilistic satisfaction of the constraint in each episode, which is different from penalizing violations to achieve constraint satisfaction after a sufficiently large number of episodes. The proposed approach is based on computing a lower bound on the probability of constraint satisfaction and adjusting the exploration behavior as needed. We present theoretical results on the probabilistic constraint satisfaction achieved by the proposed approach. We also numerically demonstrate the proposed idea in a drone scenario, where the constraint is to perform periodically arriving pick-up and delivery tasks and the objective is to fly over high-reward zones to simultaneously perform aerial monitoring.

READ FULL TEXT
research
07/29/2023

Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows

We propose an automata-theoretic approach for reinforcement learning (RL...
research
07/16/2018

Constraint-Based Visual Generation

In the last few years the systematic adoption of deep learning to visual...
research
05/24/2023

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Autonomous systems often have logical constraints arising, for example, ...
research
03/02/2020

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

A novel reinforcement learning scheme to synthesize policies for continu...
research
10/19/2020

Chance-Constrained Control with Lexicographic Deep Reinforcement Learning

This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)...
research
09/17/2022

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Recently, self-learning methods based on user satisfaction metrics and c...
research
01/29/2019

Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

Problems arise when using reward functions to capture dependencies betwe...

Please sign up or login with your details

Forgot password? Click here to reset