ISL: Optimal Policy Learning With Optimal Exploration-Exploitation Trade-Off

09/13/2019
by   Lucas Cassano, et al.
2

Traditionally, off-policy learning algorithms (such as Q-learning) and exploration schemes have been derived separately. Often times, the exploration-exploitation dilemma being addressed through heuristics. In this article we show that both the learning equations and the exploration-exploitation strategy can be derived in tandem as the solution to a unique and well-posed optimization problem whose minimization leads to the optimal value function. We present a new algorithm following this idea. The algorithm is of the gradient type (and therefore has good convergence properties even when used in conjunction with function approximators such as neural networks); it is off-policy; and it specifies both the update equations and the strategy to address the exploration-exploitation dilemma. To the best of our knowledge, this is the first algorithm that has these properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2022

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

We propose Deterministic Sequencing of Exploration and Exploitation (DSE...
research
03/07/2019

Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces

We present a new algorithm ASEBO for conducting optimization of high-dim...
research
04/06/2019

Randomised Bayesian Least-Squares Policy Iteration

We introduce Bayesian least-squares policy iteration (BLSPI), an off-pol...
research
02/02/2016

Better safe than sorry: Risky function exploitation through safe optimization

Exploration-exploitation of functions, that is learning and optimizing a...
research
07/21/2011

Centric selection: a way to tune the exploration/exploitation trade-off

In this paper, we study the exploration / exploitation trade-off in cell...
research
05/31/2023

Representation-Driven Reinforcement Learning

We present a representation-driven framework for reinforcement learning....
research
09/06/2016

Q-Learning with Basic Emotions

Q-learning is a simple and powerful tool in solving dynamic problems whe...

Please sign up or login with your details

Forgot password? Click here to reset