Syntax-guided Localized Self-attention by Constituency Syntactic Distance

10/21/2022
by   Shengyuan Hou, et al.
1

Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information from data is not necessary if we can leverage an external syntactic parser, which provides better parsing quality with well-defined syntactic structures. This could potentially improve Transformer's performance and sample efficiency. In this work, we propose a syntax-guided localized self-attention for Transformer that allows directly incorporating grammar structures from an external constituency parser. It prohibits the attention mechanism to overweight the grammatically distant tokens over close ones. Experimental results show that our model could consistently improve translation performance on a variety of machine translation datasets, ranging from small to large dataset sizes, and with different source languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

Semantics-aware Attention Improves Neural Machine Translation

The integration of syntactic structures into Transformer machine transla...
research
12/27/2020

SG-Net: Syntax Guided Transformer for Language Representation

Understanding human language is one of the key themes of artificial inte...
research
06/05/2019

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

We inspect the multi-head self-attention in Transformer NMT encoders for...
research
05/22/2023

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Although the Transformer model can effectively acquire context features ...
research
10/23/2020

GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis

Attention-based end-to-end text-to-speech synthesis (TTS) is superior to...
research
03/30/2020

Code Prediction by Feeding Trees to Transformers

In this paper, we describe how to leverage Transformer, a recent neural ...
research
10/19/2020

Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

Transformer-based pre-trained language models (PLMs) have dramatically i...

Please sign up or login with your details

Forgot password? Click here to reset