Decomposing lexical and compositional syntax and semantics with deep language models

03/02/2021
by   Charlotte Caucheteux, et al.
0

The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension. However, the nature of these activations remains largely unknown and presumably conflate distinct linguistic classes. Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four combinatorial classes: lexical, compositional, syntactic, and semantic representations. We then introduce a statistical method to decompose, through the lens of GPT2's activations, the brain activity of 345 subjects recorded with functional magnetic resonance imaging (fMRI) during the listening of  4.6 hours of narrated text. The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices. Second, contrary to previous claims, syntax and semantics are not associated with separated modules, but, instead, appear to share a common and distributed neural substrate. Overall, this study introduces a general framework to isolate the distributed representations of linguistic constructs generated in naturalistic settings.

READ FULL TEXT
research
10/12/2021

Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects

A popular approach to decompose the neural bases of language consists in...
research
03/27/2023

Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain

Linking computational natural language processing (NLP) models and neura...
research
07/07/2022

Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps

Neural Language Models (NLMs) have made tremendous advances during the l...
research
02/28/2023

Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax and Context

A fundamental question in neurolinguistics concerns the brain regions in...
research
06/03/2022

Toward a realistic model of speech processing in the brain with self-supervised learning

Several deep neural networks have recently been shown to generate activa...
research
02/11/2018

Syntax and Semantics of Italian Poetry in the First Half of the 20th Century

In this paper we study, analyse and comment rhetorical figures present i...
research
09/17/2020

Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA

Many NLP tasks have benefited from transferring knowledge from contextua...

Please sign up or login with your details

Forgot password? Click here to reset