When should agents explore?

08/26/2021
by   Miruna Pislar, et al.
0

Exploration remains a central challenge for reinforcement learning (RL). Virtually all existing methods share the feature of a monolithic behaviour policy that changes only gradually (at best). In contrast, the exploratory behaviours of animals and humans exhibit a rich diversity, namely including forms of switching between modes. This paper presents an initial study of mode-switching, non-monolithic exploration for RL. We investigate different modes to switch between, at what timescales it makes sense to switch, and what signals make for good switching triggers. We also propose practical algorithmic components that make the switching mechanism adaptive and robust, which enables flexibility without an accompanying hyper-parameter-tuning burden. Finally, we report a promising and detailed analysis on Atari, using two-mode exploration and switching at sub-episodic time-scales.

READ FULL TEXT

page 8

page 19

page 20

research
05/02/2023

An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework

Most exploration research on reinforcement learning (RL) has paid attent...
research
02/13/2022

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

We study the problem of reinforcement learning (RL) with low (policy) sw...
research
12/13/2021

A Benchmark for Low-Switching-Cost Reinforcement Learning

A ubiquitous requirement in many practical reinforcement learning (RL) a...
research
11/17/2014

Feedback Solution to Optimal Switching Problems with Switching Cost

The problem of optimal switching between nonlinear autonomous subsystems...
research
09/18/2013

Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb

In this work we explore the use of reinforcement learning (RL) to help w...
research
03/11/2020

SWIFT: Scalable Ultra-Wideband Sub-Nanosecond Wavelength Switching for Data Centre Networks

We propose a time-multiplexed DS-DBR/SOA-gated system to deliver low-pow...
research
07/24/2023

Persistent-Transient Duality: A Multi-mechanism Approach for Modeling Human-Object Interaction

Humans are highly adaptable, swiftly switching between different modes t...

Please sign up or login with your details

Forgot password? Click here to reset