DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks

04/19/2022
by   Ziyang Luo, et al.
27

Since 2017, the Transformer-based models play critical roles in various downstream Natural Language Processing tasks. However, a common limitation of the attention mechanism utilized in Transformer Encoder is that it cannot automatically capture the information of word order, so explicit position embeddings are generally required to be fed into the target model. In contrast, Transformer Decoder with the causal attention masks is naturally sensitive to the word order. In this work, we focus on improving the position encoding ability of BERT with the causal attention masks. Furthermore, we propose a new pre-trained language model DecBERT and evaluate it on the GLUE benchmark. Experimental results show that (1) the causal attention mask is effective for BERT on the language understanding tasks; (2) our DecBERT model without position embeddings achieve comparable performance on the GLUE benchmark; and (3) our modification accelerates the pre-training process and DecBERT w/ PE achieves better overall performance than the baseline systems when pre-training with the same amount of computational resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

SegaBERT: Pre-training of Segment-aware BERT for Language Understanding

Pre-trained language models have achieved state-of-the-art results in va...
research
03/14/2022

PERT: Pre-training BERT with Permuted Language Model

Pre-trained Language Models (PLMs) have been widely used in various natu...
research
07/14/2023

Improving BERT with Hybrid Pooling Network and Drop Mask

Transformer-based pre-trained language models, such as BERT, achieve gre...
research
06/28/2020

Rethinking the Positional Encoding in Language Pre-training

How to explicitly encode positional information into neural networks is ...
research
04/22/2019

Understanding Roles and Entities: Datasets and Models for Natural Language Inference

We present two new datasets and a novel attention mechanism for Natural ...
research
05/11/2023

Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*

To advance the neural encoding of Portuguese (PT), and a fortiori the te...
research
06/06/2023

Causal interventions expose implicit situation models for commonsense language understanding

Accounts of human language processing have long appealed to implicit “si...

Please sign up or login with your details

Forgot password? Click here to reset