Bandit-Based Policy Invariant Explicit Shaping for Incorporating External Advice in Reinforcement Learning

04/14/2023
by   Yash Satsangi, et al.
0

A key challenge for a reinforcement learning (RL) agent is to incorporate external/expert1 advice in its learning. The desired goals of an algorithm that can shape the learning of an RL agent with external advice include (a) maintaining policy invariance; (b) accelerating the learning of the agent; and (c) learning from arbitrary advice [3]. To address this challenge this paper formulates the problem of incorporating external advice in RL as a multi-armed bandit called shaping-bandits. The reward of each arm of shaping bandits corresponds to the return obtained by following the expert or by following a default RL algorithm learning on the true environment reward.We show that directly applying existing bandit and shaping algorithms that do not reason about the non-stationary nature of the underlying returns can lead to poor results. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES (LPIES) three different shaping algorithms built on different assumptions that reason about the long-term consequences of following the expert policy or the default RL algorithm. Our experiments in four different settings show that these proposed algorithms achieve the above-mentioned goals whereas the other algorithms fail to do so.

READ FULL TEXT
research
06/05/2021

Same State, Different Task: Continual Reinforcement Learning without Interference

Continual Learning (CL) considers the problem of training an agent seque...
research
09/18/2020

HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory

Building Reinforcement Learning (RL) algorithms which are able to adapt ...
research
06/18/2020

Deep Reinforcement Learning amidst Lifelong Non-Stationarity

As humans, our goals and our environment are persistently changing throu...
research
04/04/2019

5G Handover using Reinforcement Learning

In typical wireless cellular systems, the handover mechanism involves re...
research
07/03/2022

Government Intervention in Catastrophe Insurance Markets: A Reinforcement Learning Approach

This paper designs a sequential repeated game of a micro-founded society...
research
02/08/2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

This paper studies model-based bandit and reinforcement learning (RL) wi...
research
08/03/2023

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL

In reinforcement learning (RL), a reward function is often assumed at th...

Please sign up or login with your details

Forgot password? Click here to reset