StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

08/13/2019
by   Wei Wang, et al.
0

Recently, the pre-trained language model, BERT (Devlin et al.(2018)Devlin, Chang, Lee, and Toutanova), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman (Elman(1990)), we extend BERT to a new model, StructBERT, by incorporating language structures into pretraining. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 84.5 (with Top 1 achievement on the Leaderboard at the time of paper submission), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2019

Unified Language Model Pre-training for Natural Language Understanding and Generation

This paper presents a new Unified pre-trained Language Model (UniLM) tha...
research
12/30/2020

Out of Order: How important is the sequential order of words in a sentence in Natural Language Understanding tasks?

Do state-of-the-art natural language understanding models care about wor...
research
06/18/2021

SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

In this paper, we propose SPBERT, a transformer-based language model pre...
research
12/04/2022

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

This technical report briefly describes our JDExplore d-team's Vega v2 s...
research
04/15/2020

lamBERT: Language and Action Learning Using Multimodal BERT

Recently, the bidirectional encoder representations from transformers (B...
research
12/28/2020

BURT: BERT-inspired Universal Representation from Learning Meaningful Segment

Although pre-trained contextualized language models such as BERT achieve...
research
06/19/2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

With the capability of modeling bidirectional contexts, denoising autoen...

Please sign up or login with your details

Forgot password? Click here to reset