Context-aware Active Multi-Step Reinforcement Learning

11/11/2019
by   Gang Chen, et al.
0

Reinforcement learning has attracted great attention recently, especially policy gradient algorithms, which have been demonstrated on challenging decision making and control tasks. In this paper, we propose an active multi-step TD algorithm with adaptive stepsizes to learn actor and critic. Specifically, our model consists of two components: active stepsize learning and adaptive multi-step TD algorithm. Firstly, we divide the time horizon into chunks and actively select state and action inside each chunk. Then given the selected samples, we propose the adaptive multi-step TD, which generalizes TD(λ), but adaptively switch on/off the backups from future returns of different steps. Particularly, the adaptive multi-step TD introduces a context-aware mechanism, here a binary classifier, which decides whether or not to turn on its future backups based on the context changes. Thus, our model is kind of combination of active learning and multi-step TD algorithm, which has the capacity for learning off-policy without the need of importance sampling. We evaluate our approach on both discrete and continuous space tasks in an off-policy setting respectively, and demonstrate competitive results compared to other reinforcement learning baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning

Off-policy learning is more unstable compared to on-policy learning in r...
research
05/29/2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Multi-step learning applies lookahead over multiple time steps and has p...
research
05/23/2023

L-SA: Learning Under-Explored Targets in Multi-Target Reinforcement Learning

Tasks that involve interaction with various targets are called multi-tar...
research
08/01/2022

Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning

Compared to on-policy policy gradient techniques, off-policy model-free ...
research
11/07/2010

Reinforcement Learning Based on Active Learning Method

In this paper, a new reinforcement learning approach is proposed which i...
research
04/13/2021

Bi-level Off-policy Reinforcement Learning for Volt/VAR Control Involving Continuous and Discrete Devices

In Volt/Var control (VVC) of active distribution networks(ADNs), both sl...
research
04/28/2023

Active Reinforcement Learning for Personalized Stress Monitoring in Everyday Settings

Most existing sensor-based monitoring frameworks presume that a large av...

Please sign up or login with your details

Forgot password? Click here to reset