Structural Guidance for Transformer Language Models

07/30/2021
by   Peng Qian, et al.
0

Transformer-based language models pre-trained on large amounts of text data have proven remarkably successful in learning generic transferable linguistic representations. Here we study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models without resorting to pre-training on very large amounts of data. We explore two general ideas. The "Generative Parsing" idea jointly models the incremental parse and word sequence as part of the same sequence modeling task. The "Structural Scaffold" idea guides the language model's representation via additional structure loss that separately predicts the incremental constituency parse. We train the proposed models along with a vanilla Transformer language model baseline on a 14 million-token and a 46 million-token subset of the BLLIP dataset, and evaluate models' syntactic generalization performances on SG Test Suites and sized BLiMP. Experiment results across two benchmarks suggest converging evidence that generative structural supervisions can induce more robust and humanlike linguistic generalization in Transformer language models without the need for data intensive pre-training.

READ FULL TEXT
research
11/18/2021

RoBERTuito: a pre-trained language model for social media text in Spanish

Since BERT appeared, Transformer language models and transfer learning h...
research
08/16/2019

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training

We propose Unicoder-VL, a universal encoder that aims to learn joint rep...
research
08/21/2023

Can Language Models Learn to Listen?

We present a framework for generating appropriate facial responses from ...
research
02/03/2022

Pre-Trained Language Models for Interactive Decision-Making

Language model (LM) pre-training has proven useful for a wide variety of...
research
11/04/2021

How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction

After a neural sequence model encounters an unexpected token, can its be...
research
10/28/2022

Modeling structure-building in the brain with CCG parsing and large language models

To model behavioral and neural correlates of language comprehension in n...
research
05/31/2023

Examining the Emergence of Deductive Reasoning in Generative Language Models

We conduct a preliminary inquiry into the ability of generative transfor...

Please sign up or login with your details

Forgot password? Click here to reset