Recurrent Attention Networks for Long-text Modeling

06/12/2023
by   Xianming Li, et al.
0

Self-attention-based models have achieved remarkable progress in short-text mining. However, the quadratic computational complexities restrict their application in long text processing. Prior works have adopted the chunking strategy to divide long documents into chunks and stack a self-attention backbone with the recurrent structure to extract semantic representation. Such an approach disables parallelization of the attention mechanism, significantly increasing the training cost and raising hardware requirements. Revisiting the self-attention mechanism and the recurrent structure, this paper proposes a novel long-document encoding model, Recurrent Attention Network (RAN), to enable the recurrent operation of self-attention. Combining the advantages from both sides, the well-designed RAN is capable of extracting global semantics in both token-level and document-level representations, making it inherently compatible with both sequential and classification tasks, respectively. Furthermore, RAN is computationally scalable as it supports parallelization on long document processing. Extensive experiments demonstrate the long-text encoding ability of the proposed RAN model on both classification and sequential tasks, showing its potential for a wide range of applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2021

Self-Attention for Audio Super-Resolution

Convolutions operate only locally, thus failing to model global interact...
research
05/30/2019

A Lightweight Recurrent Network for Sequence Modeling

Recurrent networks have achieved great success on various sequential tas...
research
04/27/2021

UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with Multi-Embedding Representation for Toxicity Highlighter

Toxic Spans Detection(TSD) task is defined as highlighting spans that ma...
research
09/15/2020

Cascaded Semantic and Positional Self-Attention Network for Document Classification

Transformers have shown great success in learning representations for la...
research
11/18/2021

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval...
research
05/12/2021

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

This paper describes an automatic drum transcription (ADT) method that d...
research
08/25/2023

QKSAN: A Quantum Kernel Self-Attention Network

Self-Attention Mechanism (SAM) is skilled at extracting important inform...

Please sign up or login with your details

Forgot password? Click here to reset