A Psycholinguistic Analysis of BERT's Representations of Compounds

02/14/2023
by   Lars Buijtelaar, et al.
0

This work studies the semantic representations learned by BERT for compounds, that is, expressions such as sunlight or bodyguard. We build on recent studies that explore semantic information in Transformers at the word level and test whether BERT aligns with human semantic intuitions when dealing with expressions (e.g., sunlight) whose overall meaning depends – to a various extent – on the semantics of the constituent words (sun, light). We leverage a dataset that includes human judgments on two psycholinguistic measures of compound semantic analysis: lexeme meaning dominance (LMD; quantifying the weight of each constituent toward the compound meaning) and semantic transparency (ST; evaluating the extent to which the compound meaning is recoverable from the constituents' semantics). We show that BERT-based measures moderately align with human intuitions, especially when using contextualized representations, and that LMD is overall more predictable than ST. Contrary to the results reported for 'standard' words, higher, more contextualized layers are the best at representing compound meaning. These findings shed new light on the abilities of BERT in dealing with fine-grained semantic phenomena. Moreover, they can provide insights into how speakers represent compounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2020

CxGBERT: BERT meets Construction Grammar

While lexico-semantic elements no doubt capture a large amount of lingui...
research
11/13/2019

What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

Contextualized word embeddings, i.e. vector representations for words in...
research
10/02/2020

SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces

Lexical semantic change detection (also known as semantic shift tracing)...
research
05/27/2021

Verb Sense Clustering using Contextualized Word Representations for Semantic Frame Induction

Contextualized word representations have proven useful for various natur...
research
07/26/2012

Evolving knowledge through negotiation

Semantic web information is at the extremities of long pipelines held by...
research
11/24/2022

InDEX: Indonesian Idiom and Expression Dataset for Cloze Test

We propose InDEX, an Indonesian Idiom and Expression dataset for cloze t...
research
04/25/2023

On the Computation of Meaning, Language Models and Incomprehensible Horrors

We integrate foundational theories of meaning with a mathematical formal...

Please sign up or login with your details

Forgot password? Click here to reset