Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning

03/07/2023
by   Mostafa Kotb, et al.
0

Model-based reinforcement learning (MBRL) with real-time planning has shown great potential in locomotion and manipulation control tasks. However, the existing planning methods, such as the Cross-Entropy Method (CEM), do not scale well to complex high-dimensional environments. One of the key reasons for underperformance is the lack of exploration, as these planning methods only aim to maximize the cumulative extrinsic reward over the planning horizon. Furthermore, planning inside the compact latent space in the absence of observations makes it challenging to use curiosity-based intrinsic motivation. We propose Curiosity CEM (CCEM), an improved version of the CEM algorithm for encouraging exploration via curiosity. Our proposed method maximizes the sum of state-action Q values over the planning horizon, in which these Q values estimate the future extrinsic and intrinsic reward, hence encouraging reaching novel observations. In addition, our model uses contrastive representation learning to efficiently learn latent representations. Experiments on image-based continuous control tasks from the DeepMind Control suite show that CCEM is by a large margin more sample-efficient than previous MBRL algorithms and compares favorably with the best model-free RL methods.

READ FULL TEXT

page 1

page 3

page 4

page 6

research
08/14/2020

Sample-efficient Cross-Entropy Method for Real-time Planning

Trajectory optimizers for model-based reinforcement learning, such as th...
research
02/18/2021

State Entropy Maximization with Random Encoders for Efficient Exploration

Recent exploration methods have proven to be a recipe for improving samp...
research
12/08/2022

PALMER: Perception-Action Loop with Memory for Long-Horizon Planning

To achieve autonomy in a priori unknown real-world scenarios, agents sho...
research
11/15/2021

Learning Representations for Pixel-based Control: What Matters and Why?

Learning representations for pixel-based control has garnered significan...
research
08/22/2022

Efficient Planning in a Compact Latent Action Space

While planning-based sequence modelling methods have shown great potenti...
research
06/15/2023

Simplified Temporal Consistency Reinforcement Learning

Reinforcement learning is able to solve complex sequential decision-maki...
research
09/13/2022

A Dual-Arm Collaborative Framework for Dexterous Manipulation in Unstructured Environments with Contrastive Planning

Most object manipulation strategies for robots are based on the assumpti...

Please sign up or login with your details

Forgot password? Click here to reset