Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics

04/20/2022
by   Christos Verginis, et al.
0

We study the problem of reinforcement learning for a task encoded by a reward machine. The task is defined over a set of properties in the environment, called atomic propositions, and represented by Boolean variables. One unrealistic assumption commonly used in the literature is that the truth values of these propositions are accurately known. In real situations, however, these truth values are uncertain since they come from sensors that suffer from imperfections. At the same time, reward machines can be difficult to model explicitly, especially when they encode complicated tasks. We develop a reinforcement-learning algorithm that infers a reward machine that encodes the underlying task while learning how to execute it, despite the uncertainties of the propositions' truth values. In order to address such uncertainties, the algorithm maintains a probabilistic estimate about the truth value of the atomic propositions; it updates this estimate according to new sensory measurements that arrive from the exploration of the environment. Additionally, the algorithm maintains a hypothesis reward machine, which acts as an estimate of the reward machine that encodes the task to be learned. As the agent explores the environment, the algorithm updates the hypothesis reward machine according to the obtained rewards and the estimate of the atomic propositions' truth value. Finally, the algorithm uses a Q-learning procedure for the states of the hypothesis reward machine to determine the policy that accomplishes the task. We prove that the algorithm successfully infers the reward machine and asymptotically learns a policy that accomplishes the respective task.

READ FULL TEXT

page 13

page 15

page 17

research
09/12/2019

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Incorporating high-level knowledge is an effective way to expedite reinf...
research
07/09/2021

Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

The success of reinforcement learning in typical settings is, in part, p...
research
11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
research
05/31/2023

ROSARL: Reward-Only Safe Reinforcement Learning

An important problem in reinforcement learning is designing agents that ...
research
05/30/2022

GLDQN: Explicitly Parameterized Quantile Reinforcement Learning for Waste Reduction

We study the problem of restocking a grocery store's inventory with peri...
research
08/18/2023

Learning Reward Machines through Preference Queries over Sequences

Reward machines have shown great promise at capturing non-Markovian rewa...
research
12/20/2022

Settling the Reward Hypothesis

The reward hypothesis posits that, "all of what we mean by goals and pur...

Please sign up or login with your details

Forgot password? Click here to reset