Variational Policy Search using Sparse Gaussian Process Priors for Learning Multimodal Optimal Actions

06/14/2021
by   Hikaru Sasaki, et al.
0

Policy search reinforcement learning has been drawing much attention as a method of learning a robot control policy. In particular, policy search using such non-parametric policies as Gaussian process regression can learn optimal actions with high-dimensional and redundant sensors as input. However, previous methods implicitly assume that the optimal action becomes unique for each state. This assumption can severely limit such practical applications as robot manipulations since designing a reward function that appears in only one optimal action for complex tasks is difficult. The previous methods might have caused critical performance deterioration because the typical non-parametric policies cannot capture the optimal actions due to their unimodality. We propose novel approaches in non-parametric policy searches with multiple optimal actions and offer two different algorithms commonly based on a sparse Gaussian process prior and variational Bayesian inference. The following are the key ideas: 1) multimodality for capturing multiple optimal actions and 2) mode-seeking for capturing one optimal action by ignoring the others. First, we propose a multimodal sparse Gaussian process policy search that uses multiple overlapped GPs as a prior. Second, we propose a mode-seeking sparse Gaussian process policy search that uses the student-t distribution for a likelihood function. The effectiveness of those algorithms is demonstrated through applications to object manipulation tasks with multiple optimal actions in simulations.

READ FULL TEXT

page 21

page 24

research
05/07/2022

Gaussian Process Self-triggered Policy Search in Weakly Observable Environments

The environments of such large industrial machines as waste cranes in wa...
research
06/12/2022

Reinforcement Learning for Vision-based Object Manipulation with Non-parametric Policy and Action Primitives

The object manipulation is a crucial ability for a service robot, but it...
research
05/25/2019

Sparse Gaussian Process Modulated Hawkes Process

The Hawkes process has been widely applied to modeling self-exciting eve...
research
10/08/2021

Training Transition Policies via Distribution Matching for Complex Tasks

Humans decompose novel complex tasks into simpler ones to exploit previo...
research
09/15/2018

Learning Robust Manipulation Skills with Guided Policy Search via Generative Motor Reflexes

Guided Policy Search enables robots to learn control policies for comple...
research
05/21/2020

Novel Policy Seeking with Constrained Optimization

In this work, we address the problem of learning to seek novel policies ...
research
05/02/2017

Experimental Design for Non-Parametric Correction of Misspecified Dynamical Models

We consider a class of misspecified dynamical models where the governing...

Please sign up or login with your details

Forgot password? Click here to reset