Not All Attention Is All You Need

04/10/2021
by   Hongqiu Wu, et al.
0

Self-attention based models have achieved remarkable success in natural language processing. However, the self-attention network design is questioned as suboptimal in recent studies, due to its veiled validity and high redundancy. In this paper, we focus on pre-trained language models with self-pruning training design on task-specific tuning. We demonstrate that the lighter state-of-the-art models with nearly 80 pruned, may achieve even better results on multiple tasks, including natural language understanding, document classification, named entity recognition and POS tagging, with nearly twice faster inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2020

Self-attention-based BiGRU and capsule network for named entity recognition

Named entity recognition(NER) is one of the tasks of natural language pr...
research
01/27/2022

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Transformers-based models, such as BERT, have dramatically improved the ...
research
06/04/2019

Back Attention Knowledge Transfer for Low-resource Named Entity Recognition

In recent years, great success has been achieved in the field of natural...
research
04/08/2020

Self-Attention Gazetteer Embeddings for Named-Entity Recognition

Recent attempts to ingest external knowledge into neural models for name...
research
05/21/2023

Pruning Pre-trained Language Models with Principled Importance and Self-regularization

Iterative pruning is one of the most effective compression methods for p...
research
07/14/2022

QSAN: A Near-term Achievable Quantum Self-Attention Network

Self-Attention Mechanism (SAM), an important component of machine learni...
research
02/16/2021

Have Attention Heads in BERT Learned Constituency Grammar?

With the success of pre-trained language models in recent years, more an...

Please sign up or login with your details

Forgot password? Click here to reset