Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Many real-world reinforcement learning (RL) problems necessitate learning complex, temporally extended behavior that may only receive reward signal when the behavior is completed. If the reward-worthy behavior is known, it can be specified in terms of a non-Markovian reward function - a function that depends on aspects of the state-action history, rather than just the current state and action. Such reward functions yield sparse rewards, necessitating an inordinate number of experiences to find a policy that captures the reward-worthy pattern of behavior. Recent work has leveraged Knowledge Representation (KR) to provide a symbolic abstraction of aspects of the state that summarize reward-relevant properties of the state-action history and support learning a Markovian decomposition of the problem in terms of an automaton over the KR. Providing such a decomposition has been shown to vastly improve learning rates, especially when coupled with algorithms that exploit automaton structure. Nevertheless, such techniques rely on a priori knowledge of the KR. In this work, we explore how to automatically discover useful state abstractions that support learning automata over the state-action history. The result is an end-to-end algorithm that can learn optimal policies with significantly fewer environment samples than state-of-the-art RL on simple non-Markovian domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Reinforcement learning (RL) methods usually treat reward functions as bl...
research
11/06/2019

Distributional Reward Decomposition for Reinforcement Learning

Many reinforcement learning (RL) tasks have specific properties that can...
research
11/12/2022

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

Preference-based reinforcement learning (RL) algorithms help avoid the p...
research
08/26/2022

Visual processing in context of reinforcement learning

Although deep reinforcement learning (RL) has recently enjoyed many succ...
research
11/20/2022

Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Natural and formal languages provide an effective mechanism for humans t...
research
01/29/2021

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Designing reward functions for reinforcement learning is difficult: besi...
research
02/28/2023

Policy Dispersion in Non-Markovian Environment

Markov Decision Process (MDP) presents a mathematical framework to formu...

Please sign up or login with your details

Forgot password? Click here to reset