Reinforcement Learning with Temporal Logic Constraints for Partially-Observable Markov Decision Processes

04/04/2021
by   Yu Wang, et al.
0

This paper proposes a reinforcement learning method for controller synthesis of autonomous systems in unknown and partially-observable environments with subjective time-dependent safety constraints. Mathematically, we model the system dynamics by a partially-observable Markov decision process (POMDP) with unknown transition/observation probabilities. The time-dependent safety constraint is captured by iLTL, a variation of linear temporal logic for state distributions. Our Reinforcement learning method first constructs the belief MDP of the POMDP, capturing the time evolution of estimated state distributions. Then, by building the product belief MDP of the belief MDP and the limiting deterministic Bautomaton (LDBA) of the temporal logic constraint, we transform the time-dependent safety constraint on the POMDP into a state-dependent constraint on the product belief MDP. Finally, we learn the optimal policy by value iteration under the state-dependent constraint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2018

Logically-Constrained Neural Fitted Q-Iteration

This paper proposes a method for efficient training of the Q-function fo...
research
01/11/2020

Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes

Autonomous systems are often required to operate in partially observable...
research
01/14/2020

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Büchi Automata

This letter proposes a novel reinforcement learning method for the synth...
research
01/14/2020

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

This letter proposes a novel reinforcement learning method for the synth...
research
02/15/2019

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

In real-world scenarios, the observation data for reinforcement learning...
research
04/03/2023

Investigation of risk-aware MDP and POMDP contingency management autonomy for UAS

Unmanned aircraft systems (UAS) are being increasingly adopted for vario...
research
09/09/2013

Technical Report: Distribution Temporal Logic: Combining Correctness with Quality of Estimation

We present a new temporal logic called Distribution Temporal Logic (DTL)...

Please sign up or login with your details

Forgot password? Click here to reset