Efficient Reinforcement Learning (ERL): Targeted Exploration Through Action Saturation

11/30/2022
by   Loris Di Natale, et al.
0

Reinforcement Learning (RL) generally suffers from poor sample complexity, mostly due to the need to exhaustively explore the state space to find good policies. On the other hand, we postulate that expert knowledge of the system to control often allows us to design simple rules we expect good policies to follow at all times. In this work, we hence propose a simple yet effective modification of continuous actor-critic RL frameworks to incorporate such prior knowledge in the learned policies and constrain them to regions of the state space that are deemed interesting, thereby significantly accelerating their convergence. Concretely, we saturate the actions chosen by the agent if they do not comply with our intuition and, critically, modify the gradient update step of the policy to ensure the learning process does not suffer from the saturation step. On a room temperature control simulation case study, these modifications allow agents to converge to well-performing policies up to one order of magnitude faster than classical RL agents while retaining good final performance.

READ FULL TEXT

page 7

page 9

research
01/29/2022

Zeroth-Order Actor-Critic

Zeroth-order optimization methods and policy gradient based first-order ...
research
10/09/2020

Deep RL With Information Constrained Policies: Generalization in Continuous Control

Biological agents learn and act intelligently in spite of a highly limit...
research
07/01/2017

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Deep reinforcement learning (RL) methods have significant potential for ...
research
12/03/2019

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

In many environments, only a relatively small subset of the complete sta...
research
10/21/2021

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Reinforcement learning (RL) experiments have notoriously high variance, ...
research
12/14/2022

Efficient Exploration in Resource-Restricted Reinforcement Learning

In many real-world applications of reinforcement learning (RL), performi...
research
10/21/2019

Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning

Recent breakthroughs both in reinforcement learning and trajectory optim...

Please sign up or login with your details

Forgot password? Click here to reset