Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL

09/08/2022
by   Taku Yamagata, et al.
0

Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results by converting the RL task to a supervised learning task. Decision Transformer (DT) combines the conditional policy approach and Transformer architecture to show competitive performance against several benchmarks. However, DT lacks stitching ability – one of the critical abilities for offline RL that learns the optimal policy from sub-optimal trajectories. The issue becomes significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not suffer the same issue; however, they suffer from unstable learning behaviours, especially when it employs function approximation in an off-policy learning setting. In this paper, we propose Q-learning Decision Transformer (QDT) that addresses the shortcomings of DT by leveraging the benefit of Dynamic Programming (Q-learning). QDT utilises the Dynamic Programming (Q-learning) results to relabel the return-to-go in the training data. We then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We demonstrate the issue of DT and the advantage of QDT in a simple environment. We also evaluate QDT in the more complex D4RL benchmark showing good performance gains.

READ FULL TEXT

page 1

page 2

page 6

research
06/02/2021

Decision Transformer: Reinforcement Learning via Sequence Modeling

We present a framework that abstracts Reinforcement Learning (RL) as a s...
research
06/17/2022

Bootstrapped Transformer for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning policies from previ...
research
09/12/2023

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Decision Transformer (DT), which employs expressive sequence modeling te...
research
11/28/2022

Is Conditional Generative Modeling all you need for Decision-Making?

Recent improvements in conditional generative modeling have made it poss...
research
06/02/2022

When does return-conditioned supervised learning work for offline reinforcement learning?

Several recent works have proposed a class of algorithms for the offline...
research
08/27/2019

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

Unmanned aircraft systems can perform some more dangerous and difficult ...
research
06/24/2023

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Despite the recent advancements in offline reinforcement learning via su...

Please sign up or login with your details

Forgot password? Click here to reset