Transformers with Competitive Ensembles of Independent Mechanisms

02/27/2021
by   Alex Lamb, et al.
8

An important development in deep learning from the earliest MLPs has been a move towards architectures with structural inductive biases which enable the model to keep distinct sources of information and routes of processing well-separated. This structure is linked to the notion of independent mechanisms from the causality literature, in which a mechanism is able to retain the same processing as irrelevant aspects of the world are changed. For example, convnets enable separation over positions, while attention-based architectures (especially Transformers) learn which combination of positions to process dynamically. In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation. This potentially throws unrelated sources of information together, and limits the Transformer's ability to capture independent mechanisms. To address this, we propose Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. Additionally, we propose a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.

READ FULL TEXT

page 5

page 7

research
05/11/2021

Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments

At present, attention mechanism has been widely applied to the fields of...
research
05/30/2022

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

Recurrent neural networks have a strong inductive bias towards learning ...
research
02/16/2021

Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet

Recent years have seen a proliferation of attention mechanisms and the r...
research
11/15/2022

Adaptive Multi-Neighborhood Attention based Transformer for Graph Representation Learning

By incorporating the graph structural information into Transformers, gra...
research
02/13/2022

Flowformer: Linearizing Transformers with Conservation Flows

Transformers based on the attention mechanism have achieved impressive s...
research
04/19/2023

Beyond Transformers for Function Learning

The ability to learn and predict simple functions is a key aspect of hum...
research
03/01/2021

Coordination Among Neural Modules Through a Shared Global Workspace

Deep learning has seen a movement away from representing examples with a...

Please sign up or login with your details

Forgot password? Click here to reset