Model-Free Learning of Safe yet Effective Controllers

03/26/2021
by   Alper Kamil Bozkurt, et al.
0

In this paper, we study the problem of learning safe control policies that are also effective – i.e., maximizing the probability of satisfying the linear temporal logic (LTL) specification of the task, and the discounted reward capturing the (classic) control performance. We consider unknown environments that can be modeled as Markov decision processes (MDPs). We propose a model-free reinforcement learning algorithm that learns a policy that first maximizes the probability of ensuring the safety, then the probability of satisfying the given LTL specification and lastly, the sum of discounted Quality of Control (QoC) rewards. Finally, we illustrate the applicability of our RL-based approach on a case study.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset