Emergent Linguistic Structures in Neural Networks are Fragile

10/31/2022
by   Emanuele La Malfa, et al.
0

Large language models (LLMs) have been reported to have strong performance on natural language processing tasks. However, performance metrics such as accuracy do not measure the quality of the model in terms of its ability to robustly represent complex linguistic structure. In this work, we propose a framework to evaluate the robustness of linguistic representations using probing tasks. We leverage recent advances in extracting emergent linguistic constructs from LLMs and apply syntax-preserving perturbations to test the stability of these constructs in order to better understand the representations learned by LLMs. Empirically, we study the performance of four LLMs across six different corpora on the proposed robustness measures. We provide evidence that context-free representation (e.g., GloVe) are in some cases competitive with context-dependent representations from modern LLMs (e.g., BERT), yet equally brittle to syntax-preserving manipulations. Emergent syntactic representations in neural networks are brittle, thus our work poses the attention on the risk of comparing such structures to those that are object of a long lasting debate in linguistics.

READ FULL TEXT
research
10/10/2020

Discourse structure interacts with reference but not syntax in neural language models

Language models (LMs) trained on large quantities of text have been clai...
research
10/16/2018

INFODENS: An Open-source Framework for Learning Text Representations

The advent of representation learning methods enabled large performance ...
research
05/23/2023

Assessing Linguistic Generalisation in Language Models: A Dataset for Brazilian Portuguese

Much recent effort has been devoted to creating large-scale language mod...
research
02/23/2021

Automated Quality Assessment of Cognitive Behavioral Therapy Sessions Through Highly Contextualized Language Representations

During a psychotherapy session, the counselor typically adopts technique...
research
02/17/2023

False perspectives on human language: why statistics needs linguistics

A sharp tension exists about the nature of human language between two op...
research
12/13/2021

The King is Naked: on the Notion of Robustness for Natural Language Processing

There is growing evidence that the classical notion of adversarial robus...
research
10/17/2021

Schrödinger's Tree – On Syntax and Neural Language Models

In the last half-decade, the field of natural language processing (NLP) ...

Please sign up or login with your details

Forgot password? Click here to reset