Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

11/20/2022
by   Andrew C. Li, et al.
8

Natural and formal languages provide an effective mechanism for humans to specify instructions and reward functions. We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines, an increasingly popular automaton-inspired structure. We are interested in the case where the mapping of environment state to a symbolic (here, Reward Machine) vocabulary – commonly known as the labelling function – is uncertain from the perspective of the agent. We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions as a special class of POMDP optimization problem, and investigate several methods to address the problem, building on existing and new techniques, the latter focused on predicting Reward Machine state, rather than on grounding of individual symbols. We analyze these methods and evaluate them experimentally under varying degrees of uncertainty in the correct interpretation of the symbolic vocabulary. We verify the strength of our approach and the limitation of existing methods via an empirical investigation on both illustrative, toy domains and partially observable, deep RL domains.

READ FULL TEXT
research
12/17/2021

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Reinforcement learning (RL) is a central problem in artificial intellige...
research
04/20/2022

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

A misspecified reward can degrade sample efficiency and induce undesired...
research
01/08/2023

Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

Many real-world reinforcement learning (RL) problems necessitate learnin...
research
08/14/2023

Omega-Regular Reward Machines

Reinforcement learning (RL) is a powerful approach for training agents t...
research
12/10/2020

Understanding Learned Reward Functions

In many real-world tasks, it is not possible to procedurally specify an ...
research
12/14/2021

How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

Alchemy is a new meta-learning environment rich enough to contain intere...
research
05/25/2018

Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation

We propose a learning approach for mapping context-dependent sequential ...

Please sign up or login with your details

Forgot password? Click here to reset