Improving BERT Pretraining with Syntactic Supervision

04/21/2021
by   Giorgos Tziafas, et al.
0

Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models' capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network's training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.

READ FULL TEXT
research
05/27/2020

Syntactic Structure Distillation Pretraining For Bidirectional Encoders

Textual representation learners trained on large amounts of data have ac...
research
12/20/2022

Pretraining Without Attention

Transformers have been essential to pretraining success in NLP. Other ar...
research
09/07/2021

How much pretraining data do language models need to learn syntax?

Transformers-based pretrained language models achieve outstanding result...
research
08/29/2019

Shallow Syntax in Deep Water

Shallow syntax provides an approximation of phrase-syntactic structure o...
research
07/15/2022

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applic...
research
01/03/2023

Ego-Only: Egocentric Action Detection without Exocentric Pretraining

We present Ego-Only, the first training pipeline that enables state-of-t...

Please sign up or login with your details

Forgot password? Click here to reset