On the Theory of Reinforcement Learning with Once-per-Episode Feedback

05/29/2021
by   Niladri S. Chatterji, et al.
11

We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than the traditional requirement in RL practice that the learner receive feedback at every time step. Indeed, in many real-world applications of reinforcement learning, such as self-driving cars and robotics, it is easier to evaluate whether a learner's complete trajectory was either "good" or "bad," but harder to provide a reward signal at each step. To show that learning is possible in this more challenging setting, we study the case where trajectory labels are generated by an unknown parametric model, and provide a statistically and computationally efficient algorithm that achieves sub-linear regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2020

Reinforcement Learning with Trajectory Feedback

The computational model of reinforcement learning is based upon the abil...
research
09/07/2021

Learning to Bid in Contextual First Price Auctions

In this paper, we investigate the problem about how to bid in repeated c...
research
05/23/2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

We study human-in-the-loop reinforcement learning (RL) with trajectory p...
research
02/20/2015

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner...
research
04/04/2021

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Many real-world applications such as robotics provide hard constraints o...
research
11/17/2020

Explaining Conditions for Reinforcement Learning Behaviors from Real and Imagined Data

The deployment of reinforcement learning (RL) in the real world comes wi...
research
05/23/2018

Discovering Blind Spots in Reinforcement Learning

Agents trained in simulation may make errors in the real world due to mi...

Please sign up or login with your details

Forgot password? Click here to reset