Learning and Solving Regular Decision Processes

03/02/2020
by   Eden Abadi, et al.
0

Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. The non-Markovian behavior is restricted to depend on regular properties of the history. These can be specified using regular expressions or formulas in linear dynamic logic over finite traces. Fully specified RDPs can be solved by compiling them into an appropriate MDP. Learning RDPs from data is a challenging problem that has yet to be addressed, on which we focus in this paper. Our approach rests on a new representation for RDPs using Mealy Machines that emit a distribution and an expected reward for each state-action pair. Building on this representation, we combine automata learning techniques with history clustering to learn such a Mealy machine and solve it by adapting MCTS to it. We empirically evaluate this approach, demonstrating its feasibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2021

Efficient PAC Reinforcement Learning in Regular Decision Processes

Recently regular decision processes have been proposed as a well-behaved...
research
11/05/2021

Regular Decision Processes for Grid Worlds

Markov decision processes are typically used for sequential decision mak...
research
06/25/2017

Specifying Non-Markovian Rewards in MDPs Using LDL on Finite Traces (Preliminary Version)

In Markov Decision Processes (MDPs), the reward obtained in a state depe...
research
12/12/2012

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

A popular approach to solving a decision process with non-Markovian rewa...
research
09/23/2020

LTLf Synthesis on Probabilistic Systems

Many systems are naturally modeled as Markov Decision Processes (MDPs), ...
research
10/28/2017

Interpretable Apprenticeship Learning with Temporal Logic Specifications

Recent work has addressed using formulas in linear temporal logic (LTL) ...
research
09/11/2011

Decision-Theoretic Planning with non-Markovian Rewards

A decision process in which rewards depend on history rather than merely...

Please sign up or login with your details

Forgot password? Click here to reset