Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

05/28/2019
by   Ruihan Yang, et al.
0

A fundamental issue in reinforcement learning algorithms is the balance between exploration of the environment and exploitation of information already obtained by the agent. Especially, exploration has played a critical role for both efficiency and efficacy of the learning process. However, Existing works for exploration involve task-agnostic design, that is performing well in one environment, but be ill-suited to another. To the purpose of learning an effective and efficient exploration policy in an automated manner. We formalized a feasible metric for measuring the utility of exploration based on counterfactual ideology. Based on that, We proposed an end-to-end algorithm to learn exploration policy by meta-learning. We demonstrate that our method achieves good results compared to previous works in the high-dimensional control tasks in MuJoCo simulator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2018

Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies

In this paper, we propose a novel meta-learning method in a reinforcemen...
research
11/11/2019

MAME : Model-Agnostic Meta-Exploration

Meta-Reinforcement learning approaches aim to develop learning procedure...
research
06/08/2018

Fidelity-based Probabilistic Q-learning for Control of Quantum Systems

The balance between exploration and exploitation is a key problem for re...
research
10/25/2021

Multitask Adaptation by Retrospective Exploration with Learned World Models

Model-based reinforcement learning (MBRL) allows solving complex tasks i...
research
01/27/2022

Exploration With a Finite Brain

Equipping artificial agents with useful exploration mechanisms remains a...
research
10/18/2019

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Trading off exploration and exploitation in an unknown environment is ke...
research
09/17/2022

Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation

Deep reinforcement learning (DRL) has been proven its efficiency in capt...

Please sign up or login with your details

Forgot password? Click here to reset