Efficient Planning in a Compact Latent Action Space

08/22/2022
by   Zhengyao Jiang, et al.
4

While planning-based sequence modelling methods have shown great potential in continuous control, scaling them to high-dimensional state-action sequences remains an open challenge due to the high computational complexity and innate difficulty of planning in high-dimensional spaces. We propose the Trajectory Autoencoding Planner (TAP), a planning-based sequence modelling RL method that scales to high state-action dimensionalities. Using a state-conditional Vector-Quantized Variational Autoencoder (VQ-VAE), TAP models the conditional distribution of the trajectories given the current state. When deployed as an RL agent, TAP avoids planning step-by-step in a high-dimensional continuous action space but instead looks for the optimal latent code sequences by beam search. Unlike O(D^3) complexity of Trajectory Transformer, TAP enjoys constant O(C) planning computational complexity regarding state-action dimensionality D. Our empirical evaluation also shows the increasingly strong performance of TAP with the growing dimensionality. For Adroit robotic hand manipulation tasks with high state and action dimensionality, TAP surpasses existing model-based methods, including TT, with a large margin and also beats strong model-free actor-critic baselines.

READ FULL TEXT
research
05/19/2017

Model-Based Planning in Discrete Action Spaces

Planning actions using learned and differentiable forward models of the ...
research
10/29/2019

Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation

The fundamental challenge of planning for multi-step manipulation is to ...
research
03/07/2023

Sample-efficient Real-time Planning with Curiosity Cross-Entropy Method and Contrastive Learning

Model-based reinforcement learning (MBRL) with real-time planning has sh...
research
10/22/2022

Solving Continuous Control via Q-learning

While there has been substantial success in applying actor-critic method...
research
05/23/2019

Distributional Policy Optimization: An Alternative Approach for Continuous Control

We identify a fundamental problem in policy gradient-based methods in co...
research
12/14/2021

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Current state-of-the-art model-based reinforcement learning algorithms u...
research
03/11/2021

Generalizable Episodic Memory for Deep Reinforcement Learning

Episodic memory-based methods can rapidly latch onto past successful str...

Please sign up or login with your details

Forgot password? Click here to reset