Constrained Exploration in Reinforcement Learning with Optimality Preservation

04/05/2023
by   Peter C. Y. Chen, et al.
0

We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may prevent the agent from visiting some state-action pairs, possibly leading to the agent finding only a sub-optimal policy. To address this problem we introduce the concept of constrained exploration with optimality preservation, whereby the exploration behavior of the agent is constrained to meet a specification while the optimality of the (original) unconstrained learning process is preserved. We first establish a feedback-control structure that models the dynamics of the unconstrained learning process. We then extend this structure by adding a supervisor to ensure that the behavior of the agent meets the specification, and establish (for a class of reinforcement-learning problems with a known deterministic environment) a necessary and sufficient condition under which optimality is preserved. This work demonstrates the utility and the prospect of studying reinforcement-learning problems in the context of the theories of discrete-event systems, automata and formal languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2022

Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control

Flocking control is a challenging problem, where multiple agents, such a...
research
12/13/2018

Revisiting Exploration-Conscious Reinforcement Learning

The objective of Reinforcement Learning is to learn an optimal policy by...
research
02/23/2021

State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards

Constrained reinforcement learning involves multiple rewards that must i...
research
08/25/2023

Towards Optimal Head-to-head Autonomous Racing with Curriculum Reinforcement Learning

Head-to-head autonomous racing is a challenging problem, as the vehicle ...
research
02/03/2019

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

In this paper we consider the problem of how a reinforcement learning ag...
research
06/13/2021

A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

This paper proposes a novel fuzzy action selection method to leverage hu...
research
04/26/2021

Performance Testing Using a Smart Reinforcement Learning-Driven Test Agent

Performance testing with the aim of generating an efficient and effectiv...

Please sign up or login with your details

Forgot password? Click here to reset