Multi-Objective Decision Transformers for Offline Reinforcement Learning

08/31/2023
by   Abdelghani Ghanem, et al.
0

Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.

READ FULL TEXT
research
03/07/2023

Graph Decision Transformer

Offline reinforcement learning (RL) is a challenging task, whose objecti...
research
06/02/2021

Decision Transformer: Reinforcement Learning via Sequence Modeling

We present a framework that abstracts Reinforcement Learning (RL) as a s...
research
11/19/2021

Generalized Decision Transformer for Offline Hindsight Information Matching

How to extract as much learning signal from each trajectory data has bee...
research
05/26/2023

Emergent Agentic Transformer from Chain of Hindsight Experience

Large transformer models powered by diverse data and model scale have do...
research
12/29/2022

On Transforming Reinforcement Learning by Transformer: The Development Trajectory

Transformer, originally devised for natural language processing, has als...
research
12/15/2022

Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer

Deep reinforcement learning has recently emerged as an appealing alterna...
research
04/18/2023

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

The success of transformer models trained with a language modeling objec...

Please sign up or login with your details

Forgot password? Click here to reset