Motion Planning as Online Learning: A Multi-Armed Bandit Approach to Kinodynamic Sampling-Based Planning

08/26/2023
by   Marco Faroni, et al.
0

Kinodynamic motion planners allow robots to perform complex manipulation tasks under dynamics constraints or with black-box models. However, they struggle to find high-quality solutions, especially when a steering function is unavailable. This paper presents a novel approach that adaptively biases the sampling distribution to improve the planner's performance. The key contribution is to formulate the sampling bias problem as a non-stationary multi-armed bandit problem, where the arms of the bandit correspond to sets of possible transitions. High-reward regions are identified by clustering transitions from sequential runs of kinodynamic RRT and a bandit algorithm decides what region to sample at each timestep. The paper demonstrates the approach on several simulated examples as well as a 7-degree-of-freedom manipulation task with dynamics uncertainty, suggesting that the approach finds better solutions faster and leads to a higher success rate in execution.

READ FULL TEXT

page 1

page 7

page 8

research
07/25/2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve fu...
research
03/29/2017

Bandit-Based Model Selection for Deformable Object Manipulation

We present a novel approach to deformable object manipulation that does ...
research
08/08/2018

Nonparametric Gaussian mixture models for the multi-armed contextual bandit

The multi-armed bandit is a sequential allocation task where an agent mu...
research
07/31/2021

Debiasing Samples from Online Learning Using Bootstrap

It has been recently shown in the literature that the sample averages fr...
research
06/23/2016

Adaptive Task Assignment in Online Learning Environments

With the increasing popularity of online learning, intelligent tutoring ...
research
10/23/2019

Diversifying Database Activity Monitoring with Bandits

Database activity monitoring (DAM) systems are commonly used by organiza...
research
05/15/2018

Graph Signal Sampling via Reinforcement Learning

We formulate the problem of sampling and recovering clustered graph sign...

Please sign up or login with your details

Forgot password? Click here to reset