Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

by   Rituraj Kaushik, et al.

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.


Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

The most data-efficient algorithms for reinforcement learning in robotic...

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...

Pareto Conditioned Networks

In multi-objective optimization, learning all the policies that reach Pa...

Deep Reinforcement Learning with Adaptive Hierarchical Reward for MultiMulti-Phase Multi Multi-Objective Dexterous Manipulation

Dexterous manipulation tasks usually have multiple objectives, and the p...

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

We propose a novel approach to addressing two fundamental challenges in ...

Smooth and Efficient Policy Exploration for Robot Trajectory Learning

Many policy search algorithms have been proposed for robot learning and ...

Modularity benefits reinforcement learning agents with competing homeostatic drives

The problem of balancing conflicting needs is fundamental to intelligenc...

Please sign up or login with your details

Forgot password? Click here to reset