Learning Optimal Strategies for Temporal Tasks in Stochastic Games

02/08/2021
by   Alper Kamil Bozkurt, et al.
0

Linear temporal logic (LTL) is widely used to formally specify complex tasks for autonomy. Unlike usual tasks defined by reward functions only, LTL tasks are noncumulative and require memory-dependent strategies. In this work, we introduce a method to learn optimal controller strategies that maximize the satisfaction probability of LTL specifications of the desired tasks in stochastic games, which are natural extensions of Markov Decision Processes (MDPs) to systems with adversarial inputs. Our approach constructs a product game using the deterministic automaton derived from the given LTL task and a reward machine based on the acceptance condition of the automaton; thus, allowing for the use of a model-free RL algorithm to learn an optimal controller strategy. Since the rewards and the transition probabilities of the reward machine do not depend on the number of sets defining the acceptance condition, our approach is scalable to a wide range of LTL tasks, as we demonstrate on several case studies.

READ FULL TEXT
research
10/02/2020

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

We study the problem of synthesizing control strategies for Linear Tempo...
research
03/02/2020

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

A novel reinforcement learning scheme to synthesize policies for continu...
research
09/21/2022

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

LCRL is a software tool that implements model-free Reinforcement Learnin...
research
09/10/2023

Signal Temporal Logic Neural Predictive Control

Ensuring safety and meeting temporal specifications are critical challen...
research
03/10/2022

Strategy Complexity of Point Payoff, Mean Payoff and Total Payoff Objectives in Countable MDPs

We study countably infinite Markov decision processes (MDPs) with real-v...
research
03/08/2022

Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping

We present a computational framework for synthesis of distributed contro...
research
07/13/2023

Entropic Risk for Turn-Based Stochastic Games

Entropic risk (ERisk) is an established risk measure in finance, quantif...

Please sign up or login with your details

Forgot password? Click here to reset