Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

03/16/2023
by   Amin Falah, et al.
0

Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes (MDPs). Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the “good states" of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2018

Omega-Regular Objectives in Model-Free Reinforcement Learning

We provide the first solution for model-free reinforcement learning of ω...
research
05/23/2022

Logarithmic regret bounds for continuous-time average-reward Markov decision processes

We consider reinforcement learning for continuous-time Markov decision p...
research
08/14/2023

Omega-Regular Reward Machines

Reinforcement learning (RL) is a powerful approach for training agents t...
research
03/02/2020

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

A novel reinforcement learning scheme to synthesize policies for continu...
research
05/26/2023

Policy Synthesis and Reinforcement Learning for Discounted LTL

The difficulty of manually specifying reward functions has led to an int...
research
05/31/2021

LTL-Constrained Steady-State Policy Synthesis

Decision-making policies for agents are often synthesized with the const...
research
10/18/2020

Average-reward model-free reinforcement learning: a systematic review and literature mapping

Model-free reinforcement learning (RL) has been an active area of resear...

Please sign up or login with your details

Forgot password? Click here to reset