SegaBERT: Pre-training of Segment-aware BERT for Language Understanding

04/30/2020
by   He Bai, et al.
0

Pre-trained language models have achieved state-of-the-art results in various natural language processing tasks. Most of them are based on the Transformer architecture, which distinguishes tokens with the token position index of the input sequence. However, sentence index and paragraph index are also important to indicate the token position in a document. We hypothesize that better contextual representations can be generated from the text encoder with richer positional information. To verify this, we propose a segment-aware BERT, by replacing the token position embedding of Transformer with a combination of paragraph index, sentence index, and token index embeddings. We pre-trained the SegaBERT on the masked language modeling task in BERT but without any affiliated tasks. Experimental results show that our pre-trained model can outperform the original BERT model on various NLP tasks.

READ FULL TEXT
research
06/22/2021

A Comprehensive Exploration of Pre-training Language Models

Recently, the development of pre-trained language models has brought nat...
research
04/20/2020

MPNet: Masked and Permuted Pre-training for Language Understanding

BERT adopts masked language modeling (MLM) for pre-training and is one o...
research
04/19/2022

DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks

Since 2017, the Transformer-based models play critical roles in various ...
research
12/20/2022

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Pre-trained models have achieved remarkable success in natural language ...
research
10/09/2022

Improve Transformer Pre-Training with Decoupled Directional Relative Position Encoding and Representation Differentiations

In this work, we revisit the Transformer-based pre-trained language mode...
research
03/27/2022

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

Transformer-based language models such as BERT have achieved the state-o...
research
05/25/2021

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

Existing pre-trained language models (PLMs) are often computationally ex...

Please sign up or login with your details

Forgot password? Click here to reset