Alternating Good-for-MDP Automata

05/06/2022
by   Ernst Moritz Hahn, et al.
0

When omega-regular objectives were first proposed in model-free reinforcement learning (RL) for controlling MDPs, deterministic Rabin automata were used in an attempt to provide a direct translation from their transitions to scalar values. While these translations failed, it has turned out that it is possible to repair them by using good-for-MDPs (GFM) Büchi automata instead. These are nondeterministic Büchi automata with a restricted type of nondeterminism, albeit not as restricted as in good-for-games automata. Indeed, deterministic Rabin automata have a pretty straightforward translation to such GFM automata, which is bi-linear in the number of states and pairs. Interestingly, the same cannot be said for deterministic Streett automata: a translation to nondeterministic Rabin or Büchi automata comes at an exponential cost, even without requiring the target automaton to be good-for-MDPs. Do we have to pay more than that to obtain a good-for-MDP automaton? The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDP property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDP automata we produce from deterministic Streett automata are bi-linear in the the size of the deterministic automaton and its index, and can therefore be exponentially more succinct than minimal nondeterministic Büchi automata.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Reward Shaping for Reinforcement Learning with Omega-Regular Objectives

Recently, successful approaches have been made to exploit good-for-MDPs ...
research
02/12/2018

Alternating Nonzero Automata

We introduce a new class of automata on infinite trees called alternatin...
research
05/02/2018

One Theorem to Rule Them All: A Unified Translation of LTL into ω-Automata

We present a unified translation of LTL formulas into deterministic Rabi...
research
07/17/2023

Discounted-Sum Automata with Multiple Discount Factors

Discounting the influence of future events is a key paradigm in economic...
research
09/26/2018

Omega-Regular Objectives in Model-Free Reinforcement Learning

We provide the first solution for model-free reinforcement learning of ω...
research
05/07/2023

From Muller to Parity and Rabin Automata: Optimal Transformations Preserving (History-)Determinism

We study transformations of automata and games using Muller conditions i...
research
07/04/2012

Planning in POMDPs Using Multiplicity Automata

Planning and learning in Partially Observable MDPs (POMDPs) are among th...

Please sign up or login with your details

Forgot password? Click here to reset