Logically-Correct Reinforcement Learning

01/24/2018
by   Mohammadhosein Hasanbeig, et al.
0

We propose a novel Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Buchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, RL synthesizes a policy that satisfies the property: as such, the policy synthesis procedure is "constrained" by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP - a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.

READ FULL TEXT

page 11

page 12

page 13

research
02/02/2019

Certified Reinforcement Learning with Logic Guidance

This paper proposes the first model-free Reinforcement Learning (RL) fra...
research
09/11/2019

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Reinforcement Learning (RL) has emerged as an efficient method of choice...
research
01/14/2020

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

This letter proposes a novel reinforcement learning method for the synth...
research
10/21/2017

Insulin Regimen ML-based control for T2DM patients

We model individual T2DM patient blood glucose level (BGL) by stochasti...
research
01/14/2020

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Büchi Automata

This letter proposes a novel reinforcement learning method for the synth...
research
08/03/2023

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL

In reinforcement learning (RL), a reward function is often assumed at th...
research
11/05/2019

Apprenticeship Learning via Frank-Wolfe

We consider the applications of the Frank-Wolfe (FW) algorithm for Appre...

Please sign up or login with your details

Forgot password? Click here to reset