Reinforcement Learning for General LTL Objectives Is Intractable

11/24/2021
by   Cambridge Yang, et al.
10

In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved that previous studies have alluded to but, to our knowledge, have not examined in depth. In this paper, we address theoretically the hardness of learning with general LTL objectives. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable only if the formula is in the most limited class in the LTL hierarchy, consisting of only finite-horizon-decidable properties. Practically, our result implies that it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for non-finite-horizon-decidable LTL objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2023

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

In reinforcement learning, the classic objectives of maximizing discount...
research
11/01/2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Recently there is a surge of interest in understanding the horizon-depen...
research
09/27/2021

Model-Free Reinforcement Learning for Optimal Control of MarkovDecision Processes Under Signal Temporal Logic Specifications

We present a model-free reinforcement learning algorithm to find an opti...
research
10/29/2015

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Recently, there has been significant progress in understanding reinforce...
research
06/03/2022

PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the ...
research
09/05/2020

PAC Reinforcement Learning Algorithm for General-Sum Markov Games

This paper presents a theoretical framework for probably approximately c...
research
04/05/2016

Bounded Optimal Exploration in MDP

Within the framework of probably approximately correct Markov decision p...

Please sign up or login with your details

Forgot password? Click here to reset