Reinforcement Learning with Trajectory Feedback

08/13/2020
by   Yonathan Efroni, et al.
46

The computational model of reinforcement learning is based upon the ability to query a score of every visited state-action pair, i.e., to observe a per state-action reward signal. However, in practice, it is often the case such a score is not readily available to the algorithm designer. In this work, we relax this assumption and require a weaker form of feedback, which we refer to as trajectory feedback. Instead of observing the reward from every visited state-action pair, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent. We study natural extensions of reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing the regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a computationally efficient algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

We study a theory of reinforcement learning (RL) in which the learner re...
research
12/09/2019

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

We design a new provably efficient algorithm for episodic reinforcement ...
research
11/24/2020

Learning Principle of Least Action with Reinforcement Learning

Nature provides a way to understand physics with reinforcement learning ...
research
06/13/2022

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

The remarkable success of reinforcement learning (RL) heavily relies on ...
research
05/24/2023

Provable Offline Reinforcement Learning with Human Feedback

In this paper, we investigate the problem of offline reinforcement learn...
research
06/30/2020

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...
research
03/13/2017

Reinforcement Learning for Transition-Based Mention Detection

This paper describes an application of reinforcement learning to the men...

Please sign up or login with your details

Forgot password? Click here to reset