GATology for Linguistics: What Syntactic Dependencies It Knows

05/22/2023
by   Yuqian Dai, et al.
0

Graph Attention Network (GAT) is a graph neural network which is one of the strategies for modeling and representing explicit syntactic knowledge and can work with pre-trained models, such as BERT, in downstream tasks. Currently, there is still a lack of investigation into how GAT learns syntactic knowledge from the perspective of model structure. As one of the strategies for modeling explicit syntactic knowledge, GAT and BERT have never been applied and discussed in Machine Translation (MT) scenarios. We design a dependency relation prediction task to study how GAT learns syntactic knowledge of three languages as a function of the number of attention heads and layers. We also use a paired t-test and F1-score to clarify the differences in syntactic dependency prediction between GAT and BERT fine-tuned by the MT task (MT-B). The experiments show that better performance can be achieved by appropriately increasing the number of attention heads with two GAT layers. With more than two layers, learning suffers. Moreover, GAT is more competitive in training speed and syntactic dependency prediction than MT-B, which may reveal a better incorporation of modeling explicit syntactic knowledge and the possibility of combining GAT and BERT in the MT tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Although the Transformer model can effectively acquire context features ...
research
09/10/2023

RGAT: A Deeper Look into Syntactic Dependency Information for Coreference Resolution

Although syntactic information is beneficial for many NLP tasks, combini...
research
08/10/2020

Does BERT Solve Commonsense Task via Commonsense Knowledge?

The success of pre-trained contextualized language models such as BERT m...
research
11/27/2019

Do Attention Heads in BERT Track Syntactic Dependencies?

We investigate the extent to which individual attention heads in pretrai...
research
11/23/2021

Boosting Neural Machine Translation with Dependency-Scaled Self-Attention Network

The neural machine translation model assumes that syntax knowledge can b...
research
04/30/2020

Universal Dependencies according to BERT: both more specific and more general

This work focuses on analyzing the form and extent of syntactic abstract...
research
05/21/2021

A Non-Linear Structural Probe

Probes are models devised to investigate the encoding of knowledge – e.g...

Please sign up or login with your details

Forgot password? Click here to reset