Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

11/10/2019
by   Dhanasekar Sundararaman, et al.
0

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder based on their relationships to all tokens in a sequence. Recent studies have shown that although such models are capable of learning syntactic features purely by seeing examples, explicitly feeding this information to deep learning models can significantly enhance their performance. Leveraging syntactic information like part of speech (POS) may be particularly beneficial in limited training data settings for complex models such as the Transformer. We show that the syntax-infused Transformer with multiple features achieves an improvement of 0.7 BLEU when trained on the full WMT 14 English to German translation dataset and a maximum improvement of 1.99 BLEU points when trained on a fraction of the dataset. In addition, we find that the incorporation of syntax into BERT fine-tuning outperforms baseline on a number of downstream tasks from the GLUE benchmark.

READ FULL TEXT

page 3

page 5

research
03/07/2021

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Pre-trained language models like BERT achieve superior performances in v...
research
12/30/2020

Improving BERT with Syntax-aware Local Attention

Pre-trained Transformer-based neural language models, such as BERT, have...
research
12/28/2020

Syntax-Enhanced Pre-trained Model

We study the problem of leveraging the syntactic structure of text to en...
research
04/08/2022

Improving Tokenisation by Alternative Treatment of Spaces

Tokenisation is the first step in almost all NLP tasks, and state-of-the...
research
01/22/2021

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Multi-headed attention heads are a mainstay in transformer-based models....
research
08/13/2020

On the Importance of Local Information in Transformer Based Models

The self-attention module is a key component of Transformer-based models...
research
08/31/2021

Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience

Pretrained transformer-based models such as BERT have demonstrated state...

Please sign up or login with your details

Forgot password? Click here to reset