The Gambler's Problem and Beyond

12/31/2019
by   Baoxiang Wang, et al.
14

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, not smooth on any interval, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could lead insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2011

The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning

In this theoretical paper we are concerned with the problem of learning ...
research
06/12/2020

Decorrelated Double Q-learning

Q-learning with value function approximation may have the poor performan...
research
02/14/2020

Frequency-based Search-control in Dyna

Model-based reinforcement learning has been empirically demonstrated as ...
research
05/23/2019

Recurrent Value Functions

Despite recent successes in Reinforcement Learning, value-based methods ...
research
05/27/2021

Pattern Transfer Learning for Reinforcement Learning in Order Dispatching

Order dispatch is one of the central problems to ride-sharing platforms....
research
06/20/2021

Optimal Strategies for Decision Theoretic Online Learning

We extend the drifting games analysis to continuous time and show that t...
research
02/05/2019

Separating value functions across time-scales

In many finite horizon episodic reinforcement learning (RL) settings, it...

Please sign up or login with your details

Forgot password? Click here to reset