Improving BERT with Syntax-aware Local Attention

12/30/2020
by   Zhongli Li, et al.
0

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.

READ FULL TEXT
research
03/07/2021

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Pre-trained language models like BERT achieve superior performances in v...
research
06/11/2019

What Does BERT Look At? An Analysis of BERT's Attention

Large pre-trained neural networks such as BERT have had great recent suc...
research
11/10/2019

Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

Attention-based models have shown significant improvement over tradition...
research
11/28/2020

Understanding How BERT Learns to Identify Edits

Pre-trained transformer language models such as BERT are ubiquitous in N...
research
08/31/2019

Quantity doesn't buy quality syntax with neural language models

Recurrent neural networks can learn to predict upcoming words remarkably...
research
03/29/2020

User Generated Data: Achilles' heel of BERT

Pre-trained language models such as BERT are known to perform exceedingl...
research
10/07/2022

DABERT: Dual Attention Enhanced BERT for Semantic Matching

Transformer-based pre-trained language models such as BERT have achieved...

Please sign up or login with your details

Forgot password? Click here to reset