DOP: Deep Optimistic Planning with Approximate Value Function Evaluation

03/22/2018
by   Francesco Riccio, et al.
0

Research on reinforcement learning has demonstrated promising results in manifold applications and domains. Still, efficiently learning effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. multi-agent systems or hyper-redundant robots). To alleviate this problem, we present DOP, a deep model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) plan effective policies. Specifically, we exploit deep neural networks to learn Q-functions that are used to attack the curse of dimensionality during a Monte-Carlo tree search. Our algorithm, in fact, constructs upper confidence bounds on the learned value function to select actions optimistically. We implement and evaluate DOP on different scenarios: (1) a cooperative navigation problem, (2) a fetching task for a 7-DOF KUKA robot, and (3) a human-robot handover with a humanoid robot (both in simulation and real). The obtained results show the effectiveness of DOP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance.

READ FULL TEXT

page 1

page 4

research
03/01/2018

Q-CP: Learning Action Values for Cooperative Planning

Research on multi-robot systems has demonstrated promising results in ma...
research
03/16/2023

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Efficient exploration is critical in cooperative deep Multi-Agent Reinfo...
research
03/28/2023

The challenge of redundancy on multi-agent value factorisation

In the field of cooperative multi-agent reinforcement learning (MARL), t...
research
01/12/2022

Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees

In this work we present a preliminary investigation of a novel algorithm...
research
12/16/2021

Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation

We study the problem of multi-robot mapless navigation in the popular Ce...
research
01/09/2023

Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration

We consider the problem of cooperative exploration where multiple robots...
research
02/11/2020

Static and Dynamic Values of Computation in MCTS

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methods fo...

Please sign up or login with your details

Forgot password? Click here to reset