Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

05/02/2023
by   Daqian Shao, et al.
0

Linear Temporal Logic (LTL) is widely used to specify high-level objectives for system policies, and it is highly desirable for autonomous systems to learn the optimal policy with respect to such specifications. However, learning the optimal policy from LTL specifications is not trivial. We present a model-free Reinforcement Learning (RL) approach that efficiently learns an optimal policy for an unknown stochastic system, modelled using Markov Decision Processes (MDPs). We propose a novel and more general product MDP, reward structure and discounting mechanism that, when applied in conjunction with off-the-shelf model-free RL algorithms, efficiently learn the optimal policy that maximizes the probability of satisfying a given LTL specification with optimality guarantees. We also provide improved theoretical results on choosing the key parameters in RL to ensure optimality. To directly evaluate the learned policy, we adopt probabilistic model checker PRISM to compute the probability of the policy satisfying such specifications. Several experiments on various tabular MDP environments across different LTL tasks demonstrate the improved sample efficiency and optimal policy convergence.

READ FULL TEXT

page 14

page 15

page 16

research
09/16/2019

Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning

We present a reinforcement learning (RL) framework to synthesize a contr...
research
09/21/2022

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

LCRL is a software tool that implements model-free Reinforcement Learnin...
research
08/25/2022

Learning Task Automata for Reinforcement Learning using Hidden Markov Models

Training reinforcement learning (RL) agents using scalar reward signals ...
research
09/23/2019

Modular Deep Reinforcement Learning with Temporal Logic Specifications

We propose an actor-critic, model-free, and online Reinforcement Learnin...
research
05/09/2022

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

This paper addresses the problem of learning control policies for mobile...
research
02/04/2022

Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives

Reinforcement learning (RL) is a popular approach for robotic path plann...
research
01/07/2022

Mirror Learning: A Unifying Framework of Policy Optimisation

General policy improvement (GPI) and trust-region learning (TRL) are the...

Please sign up or login with your details

Forgot password? Click here to reset