Sequential Attention Module for Natural Language Processing

09/07/2021
by   Mengyuan Zhou, et al.
0

Recently, large pre-trained neural language models have attained remarkable performance on many downstream natural language processing (NLP) applications via fine-tuning. In this paper, we target at how to further improve the token representations on the language models. We, therefore, propose a simple yet effective plug-and-play module, Sequential Attention Module (SAM), on the token embeddings learned from a pre-trained language model. Our proposed SAM consists of two main attention modules deployed sequentially: Feature-wise Attention Module (FAM) and Token-wise Attention Module (TAM). More specifically, FAM can effectively identify the importance of features at each dimension and promote the effect via dot-product on the original token embeddings for downstream NLP applications. Meanwhile, TAM can further re-weight the features at the token-wise level. Moreover, we propose an adaptive filter on FAM to prevent noise impact and increase information absorption. Finally, we conduct extensive experiments to demonstrate the advantages and properties of our proposed SAM. We first show how SAM plays a primary role in the champion solution of two subtasks of SemEval'21 Task 7. After that, we apply SAM on sentiment analysis and three popular NLP tasks and demonstrate that SAM consistently outperforms the state-of-the-art baselines.

READ FULL TEXT
research
11/08/2019

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Transfer learning has fundamentally changed the landscape of natural lan...
research
10/06/2021

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

Pre-trained Natural Language Processing (NLP) models can be easily adapt...
research
10/27/2020

Interpretation of NLP models through input marginalization

To demystify the "black box" property of deep neural networks for natura...
research
11/12/2020

Context-aware Stand-alone Neural Spelling Correction

Existing natural language processing systems are vulnerable to noisy inp...
research
08/09/2022

DeepHider: A Multi-module and Invisibility Watermarking Scheme for Language Model

Natural language processing (NLP) technology has shown great economic va...
research
03/14/2023

Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers

Long-sequence transformers are designed to improve the representation of...
research
10/25/2022

MemoNet:Memorizing Representations of All Cross Features Efficiently via Multi-Hash Codebook Network for CTR Prediction

New findings in natural language processing(NLP) demonstrate that the st...

Please sign up or login with your details

Forgot password? Click here to reset