Learning Reward Machines through Preference Queries over Sequences

08/18/2023
by   Eric Hsiung, et al.
0

Reward machines have shown great promise at capturing non-Markovian reward functions for learning tasks that involve complex action sequencing. However, no algorithm currently exists for learning reward machines with realistic weak feedback in the form of preferences. We contribute REMAP, a novel algorithm for learning reward machines from preferences, with correctness and termination guarantees. REMAP introduces preference queries in place of membership queries in the L* algorithm, and leverages a symbolic observation table along with unification and constraint solving to narrow the hypothesis reward machine search space. In addition to the proofs of correctness and termination for REMAP, we present empirical evidence measuring correctness: how frequently the resulting reward machine is isomorphic under a consistent yet inexact teacher, and the regret between the ground truth and learned reward machines.

READ FULL TEXT

page 23

page 24

research
09/28/2022

Argumentative Reward Learning: Reasoning About Human Preferences

We define a novel neuro-symbolic framework, argumentative reward learnin...
research
10/17/2022

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

Specifying rewards for reinforcement learned (RL) agents is challenging....
research
04/20/2022

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

A misspecified reward can degrade sample efficiency and induce undesired...
research
08/14/2023

Omega-Regular Reward Machines

Reinforcement learning (RL) is a powerful approach for training agents t...
research
04/13/2022

A Study of Causal Confusion in Preference-Based Reward Learning

Learning robot policies via preference-based reward learning is an incre...
research
10/19/2022

Scaling Laws for Reward Model Overoptimization

In reinforcement learning from human feedback, it is common to optimize ...
research
04/20/2022

Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics

We study the problem of reinforcement learning for a task encoded by a r...

Please sign up or login with your details

Forgot password? Click here to reset