Revisiting Exploration-Conscious Reinforcement Learning

12/13/2018
by   Lior Shani, et al.
0

The objective of Reinforcement Learning is to learn an optimal policy by performing actions and observing their long term consequences. Unfortunately, acquiring such a policy can be a hard task. More severely, since one cannot tell if a policy is optimal, there is a constant need for exploration. This is known as the Exploration-Exploitation trade-off. In practice, this trade-off is resolved by using some inherent exploration mechanism, such as the ϵ-greedy exploration, while still trying to learn the optimal policy. In this work, we take a different approach. We define a surrogate optimality objective: an optimal policy with respect to the exploration scheme. As we show throughout the paper, although solving this criterion does not necessarily lead to an optimal policy, the problem becomes easier to solve. We continue by analyzing this notion of optimality, devise algorithms derived from this approach, which reveal connections to existing work, and test them empirically on tabular and deep Reinforcement Learning domains.

READ FULL TEXT
research
10/11/2022

The Role of Exploration for Task Transfer in Reinforcement Learning

The exploration–exploitation trade-off in reinforcement learning (RL) is...
research
07/10/2019

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

What is a good exploration strategy for an agent that interacts with an ...
research
12/16/2021

Unsupervised Reinforcement Learning in Multiple Environments

Several recent works have been dedicated to unsupervised reinforcement l...
research
04/14/2022

Reinforcement Learning Policy Recommendation for Interbank Network Stability

In this paper we analyze the effect of a policy recommendation on the pe...
research
04/05/2023

Constrained Exploration in Reinforcement Learning with Optimality Preservation

We consider a class of reinforcement-learning systems in which the agent...
research
06/01/2021

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Policy-based reinforcement learning methods suffer from the policy colla...
research
07/15/2023

Evaluation of Deep Reinforcement Learning Algorithms for Portfolio Optimisation

We evaluate benchmark deep reinforcement learning (DRL) algorithms on th...

Please sign up or login with your details

Forgot password? Click here to reset