Reinforcement Learning Algorithm Selection

01/30/2017
by   Romain Laroche, et al.
0

This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.

READ FULL TEXT

page 11

page 17

page 18

research
09/30/2019

Meta-Q-Learning

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm ...
research
01/19/2023

A Survey of Meta-Reinforcement Learning

While deep reinforcement learning (RL) has fueled multiple high-profile ...
research
10/30/2021

Context Meta-Reinforcement Learning via Neuromodulation

Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt ...
research
01/01/2020

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

We propose and address a novel few-shot RL problem, where a task is char...
research
06/16/2020

Model-based Adversarial Meta-Reinforcement Learning

Meta-reinforcement learning (meta-RL) aims to learn from multiple traini...
research
05/12/2014

Structural Return Maximization for Reinforcement Learning

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy ...
research
12/01/2021

On the Practical Consistency of Meta-Reinforcement Learning Algorithms

Consistency is the theoretical property of a meta learning algorithm tha...

Please sign up or login with your details

Forgot password? Click here to reset