Revisiting Transformer-based Models for Long Document Classification

04/14/2022
by   Xiang Dai, et al.
0

The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by vanilla Transformer-based models. We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers to encode much longer text, namely sparse attention and hierarchical encoding methods. We examine several aspects of sparse attention (e.g., size of local attention window, use of global attention) and hierarchical (e.g., document splitting strategy) transformers on four document classification datasets covering different domains. We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/11/2023

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from N...
research
10/11/2022

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Long...
research
02/07/2023

Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

Recent advances in the area of long document matching have primarily foc...
research
05/11/2020

Local Self-Attention over Long Text for Efficient Document Retrieval

Neural networks, particularly Transformer-based architectures, have achi...
research
12/31/2020

ERNIE-DOC: The Retrospective Long-Document Modeling Transformer

Transformers are not suited for processing long document input due to it...
research
11/01/2021

Comparative Study of Long Document Classification

The amount of information stored in the form of documents on the interne...
research
07/30/2023

LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning

Transformer-based models have revolutionized the performance of a wide r...

Please sign up or login with your details

Forgot password? Click here to reset