ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

09/12/2023
by   Chenxiao Gao, et al.
0

Decision Transformer (DT), which employs expressive sequence modeling techniques to perform action generation, has emerged as a promising approach to offline policy optimization. However, DT generates actions conditioned on a desired future return, which is known to bear some weaknesses such as the susceptibility to environmental stochasticity. To overcome DT's weaknesses, we propose to empower DT with dynamic programming. Our method comprises three steps. First, we employ in-sample value iteration to obtain approximated value functions, which involves dynamic programming over the MDP structure. Second, we evaluate action quality in context with estimated advantages. We introduce two types of advantage estimators, IAE and GAE, which are suitable for different tasks. Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages. Finally, during testing, ACT generates actions conditioned on a desired advantage. Our evaluation results validate that, by leveraging the power of dynamic programming, ACT demonstrates effective trajectory stitching and robust action generation in spite of the environmental stochasticity, outperforming baseline methods across various benchmarks. Additionally, we conduct an in-depth analysis of ACT's various design choices through ablation studies.

READ FULL TEXT
research
10/30/2017

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value it...
research
09/08/2022

Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

Recent works have shown that tackling offline reinforcement learning (RL...
research
10/24/2022

Dichotomy of Control: Separating What You Can Control from What You Cannot

Future- or return-conditioned supervised learning is an emerging paradig...
research
03/15/2022

ActFormer: A GAN Transformer Framework towards General Action-Conditioned 3D Human Motion Generation

We present a GAN Transformer framework for general action-conditioned 3D...
research
05/26/2023

Future-conditioned Unsupervised Pretraining for Decision Transformer

Recent research in offline reinforcement learning (RL) has demonstrated ...
research
04/18/2023

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

The success of transformer models trained with a language modeling objec...

Please sign up or login with your details

Forgot password? Click here to reset