Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

06/25/2018
by   Rituraj Kaushik, et al.
10

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

READ FULL TEXT
research
09/20/2017

Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

The most data-efficient algorithms for reinforcement learning in robotic...
research
10/05/2021

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...
research
04/11/2022

Pareto Conditioned Networks

In multi-objective optimization, learning all the policies that reach Pa...
research
05/26/2022

Deep Reinforcement Learning with Adaptive Hierarchical Reward for MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation

Dexterous manipulation tasks usually have multiple objectives, and the p...
research
03/01/2023

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

We propose a novel approach to addressing two fundamental challenges in ...
research
04/13/2018

Smooth and Efficient Policy Exploration for Robot Trajectory Learning

Many policy search algorithms have been proposed for robot learning and ...
research
04/13/2022

Modularity benefits reinforcement learning agents with competing homeostatic drives

The problem of balancing conflicting needs is fundamental to intelligenc...

Please sign up or login with your details

Forgot password? Click here to reset