Quantile Reinforcement Learning

11/03/2016
by   Hugo Gilbert, et al.
0

In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2022

Quantile-Based Policy Optimization for Reinforcement Learning

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
02/23/2022

Learning Relative Return Policies With Upside-Down Reinforcement Learning

Lately, there has been a resurgence of interest in using supervised lear...
research
09/06/2019

Gradient Q(σ, λ): A Unified Algorithm with Function Approximation for Reinforcement Learning

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sa...
research
11/30/2020

Soft-Robust Algorithms for Handling Model Misspecification

In reinforcement learning, robust policies for high-stakes decision-maki...
research
02/11/2019

Stochastic Reinforcement Learning

In reinforcement learning episodes, the rewards and punishments are ofte...
research
05/06/2019

Deep Ordinal Reinforcement Learning

Reinforcement learning usually makes use of numerical rewards, which hav...
research
10/15/2020

Blending Search and Discovery: Tag-Based Query Refinement with Contextual Reinforcement Learning

We tackle tag-based query refinement as a mobile-friendly alternative to...

Please sign up or login with your details

Forgot password? Click here to reset