DoT: An efficient Double Transformer for NLP tasks with tables

06/01/2021
by   Syrine Krichene, et al.
11

Transformer-based approaches have been successfully used to obtain state-of-the-art accuracy on natural language processing (NLP) tasks with semi-structured tables. These model architectures are typically deep, resulting in slow training and inference, especially for long inputs. To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Additionally, we modify the task-specific attention to incorporate the pruning scores. The two transformers are jointly trained by optimizing the task-specific loss. We run experiments on three benchmarks, including entailment and question-answering. We show that for a small drop of accuracy, DoT improves training and inference time by at least 50 pruning transformer effectively selects relevant tokens enabling the end-to-end model to maintain similar accuracy as slower baseline models. Finally, we analyse the pruning and give some insight into its impact on the task model.

READ FULL TEXT
research
12/16/2021

Block-Skim: Efficient Question Answering for Transformer

Transformer models have achieved promising results on natural language p...
research
06/20/2020

Memory Transformer

Transformer-based models have achieved state-of-the-art results in many ...
research
06/02/2021

On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers

How much information do NLP tasks really need from a transformer's atten...
research
03/15/2022

Efficient Long Sequence Encoding via Synchronization

Pre-trained Transformer models have achieved successes in a wide range o...
research
10/07/2020

Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models

Transformer models have garnered a lot of interest in recent years by de...
research
10/28/2021

Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures

Recent years have seen a growing adoption of Transformer models such as ...
research
05/15/2022

Transkimmer: Transformer Learns to Layer-wise Skim

Transformer architecture has become the de-facto model for many machine ...

Please sign up or login with your details

Forgot password? Click here to reset