Omega-Regular Objectives in Model-Free Reinforcement Learning

09/26/2018
by   Ernst Moritz Hahn, et al.
0

We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almost- sure reachability problem and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Reward Shaping for Reinforcement Learning with Omega-Regular Objectives

Recently, successful approaches have been made to exploit good-for-MDPs ...
research
03/02/2020

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

A novel reinforcement learning scheme to synthesize policies for continu...
research
03/16/2023

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

Continuous-time Markov decision processes (CTMDPs) are canonical models ...
research
06/12/2021

Model-free Reinforcement Learning for Branching Markov Decision Processes

We study reinforcement learning for the optimal control of Branching Mar...
research
05/19/2019

Reinforcement Learning for Learning of Dynamical Systems in Uncertain Environment: a Tutorial

In this paper, a review of model-free reinforcement learning for learnin...
research
05/06/2022

Alternating Good-for-MDP Automata

When omega-regular objectives were first proposed in model-free reinforc...
research
11/04/2021

Model-Free Risk-Sensitive Reinforcement Learning

We extend temporal-difference (TD) learning in order to obtain risk-sens...

Please sign up or login with your details

Forgot password? Click here to reset