Adaptive Transformers for Learning Multimodal Representations

05/15/2020
by   Prajjwal Bhargava, et al.
0

The usage of transformers has grown from learning about language semantics to forming meaningful visiolinguistic representations. These architectures are often over-parametrized, requiring large amounts of computation. In this work, we extend adaptive approaches to learn more about model interpretability and computational efficiency. Specifically, we study attention spans, sparse, and structured dropout methods to help understand how their attention mechanism extends for vision and language tasks. We further show that these approaches can help us learn more about how the network perceives the complexity of input sequences, sparsity preferences for different modalities, and other related phenomena.

READ FULL TEXT
research
01/31/2021

Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers

Recently multimodal transformer models have gained popularity because th...
research
09/24/2021

Predicting Attention Sparsity in Transformers

A bottleneck in transformer architectures is their quadratic complexity ...
research
06/02/2021

Is Sparse Attention more Interpretable?

Sparse attention has been claimed to increase model interpretability und...
research
07/12/2021

Combiner: Full Attention Transformer with Sparse Computation Cost

Transformers provide a class of expressive architectures that are extrem...
research
08/30/2019

Adaptively Sparse Transformers

Attention mechanisms have become ubiquitous in NLP. Recent architectures...
research
02/28/2022

Dynamic N:M Fine-grained Structured Sparse Attention Mechanism

Transformers are becoming the mainstream solutions for various tasks lik...
research
07/01/2023

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

Multimodal multitask learning has attracted an increasing interest in re...

Please sign up or login with your details

Forgot password? Click here to reset