Learning Task Automata for Reinforcement Learning using Hidden Markov Models

08/25/2022
by   Alessandro Abate, et al.
5

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating it as a partially observable MDP and using off-the-shelf algorithms for hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results to illustrate our algorithm's performance in different environments and tasks and its ability to incorporate prior domain knowledge to facilitate more efficient learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Sample Efficient Model-free Reinforcement Learning from LTL Specifications with Optimality Guarantees

Linear Temporal Logic (LTL) is widely used to specify high-level objecti...
research
01/25/2020

Learning Non-Markovian Reward Models in MDPs

There are situations in which an agent should receive rewards only after...
research
03/24/2023

Learning Reward Machines in Cooperative Multi-Agent Tasks

This paper presents a novel approach to Multi-Agent Reinforcement Learni...
research
01/14/2020

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Büchi Automata

This letter proposes a novel reinforcement learning method for the synth...
research
05/30/2022

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

We generalise the problem of reward modelling (RM) for reinforcement lea...
research
11/23/2021

Inducing Functions through Reinforcement Learning without Task Specification

We report a bio-inspired framework for training a neural network through...
research
11/29/2019

Induction of Subgoal Automata for Reinforcement Learning

In this work we present ISA, a novel approach for learning and exploitin...

Please sign up or login with your details

Forgot password? Click here to reset