ERNIE-DOC: The Retrospective Long-Document Modeling Transformer

12/31/2020
by   Siyu Ding, et al.
8

Transformers are not suited for processing long document input due to its quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or inferior modeling capability with comparable model size. In this paper, we propose ERNIE-DOC, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism enable ERNIE-DOC with much longer effective context length to capture the contextual information of a whole document. We pretrain ERNIE-DOC to explicitly learn the relationship among segments with an additional document-aware segment reordering objective. Various experiments on both English and Chinese document-level tasks are conducted. ERNIE-DOC achieves SOTA language modeling result of 16.8 ppl on WikiText-103 and outperforms competitive pretraining models on most language understanding tasks such as text classification, question answering by a large margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2023

KoBigBird-large: Transformation of Transformer for Korean Language Understanding

This work presents KoBigBird-large, a large size of Korean BigBird that ...
research
06/19/2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

With the capability of modeling bidirectional contexts, denoising autoen...
research
04/14/2022

Revisiting Transformer-based Models for Long Document Classification

The recent literature in text classification is biased towards short tex...
research
01/26/2019

Language Model Pre-training for Hierarchical Document Representations

Hierarchical neural architectures are often used to capture long-distanc...
research
11/07/2019

Blockwise Self-Attention for Long Document Understanding

We present BlockBERT, a lightweight and efficient BERT model that is des...
research
06/03/2021

Luna: Linear Unified Nested Attention

The quadratic computational and memory complexities of the Transformer's...
research
05/16/2020

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehensio

In this paper, we study machine reading comprehension (MRC) on long text...

Please sign up or login with your details

Forgot password? Click here to reset