Accessing Higher-level Representations in Sequential Transformers with Feedback Memory

02/21/2020
by   Angela Fan, et al.
5

Transformers are feedforward networks that can process input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input - the representation at a given layer can only access representations from lower layers, rather than the higher level representations already built in previous time steps. In this work, we propose the Feedback Transformer architecture that exposes all previous representations to all future representations, meaning the lowest representation of the current timestep is formed from the highest-level abstract representation of the past. We demonstrate on a variety of benchmarks in language modeling, neural machine translation, summarization, and reinforcement learning that the increased representation capacity can improve over Transformer baselines.

READ FULL TEXT
research
09/03/2019

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

We seek to understand how the representations of individual tokens and t...
research
06/04/2021

Scalable Transformers for Neural Machine Translation

Transformer has been widely adopted in Neural Machine Translation (NMT) ...
research
09/28/2020

Deep Transformers with Latent Depth

The Transformer model has achieved state-of-the-art performance in many ...
research
03/27/2023

Accelerating Trajectory Generation for Quadrotors Using Transformers

In this work, we address the problem of computation time for trajectory ...
research
11/23/2022

TorchScale: Transformers at Scale

Large Transformers have achieved state-of-the-art performance across man...
research
12/30/2016

Feedback Networks

Currently, the most successful learning models in computer vision are ba...
research
05/30/2022

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

Recurrent neural networks have a strong inductive bias towards learning ...

Please sign up or login with your details

Forgot password? Click here to reset