Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework

10/14/2021
by   Khanh Nguyen, et al.
0

Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2016

Data-Efficient Reinforcement Learning in Continuous-State POMDPs

We present a data-efficient reinforcement learning algorithm resistant t...
research
07/30/2018

Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration

To achieve scenario intelligence, humans must transfer knowledge to robo...
research
08/22/2019

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Off-policy evaluation (OPE) in reinforcement learning allows one to eval...
research
03/10/2016

Hierarchical Linearly-Solvable Markov Decision Problems

We present a hierarchical reinforcement learning framework that formulat...
research
06/03/2021

Hierarchical Representation Learning for Markov Decision Processes

In this paper we present a novel method for learning hierarchical repres...
research
05/04/2022

Multi-subgoal Robot Navigation in Crowds with History Information and Interactions

Robot navigation in dynamic environments shared with humans is an import...
research
09/08/2022

Double Q-Learning for Citizen Relocation During Natural Hazards

Natural disasters can cause substantial negative socio-economic impacts ...

Please sign up or login with your details

Forgot password? Click here to reset