Probing Pretrained Language Models for Lexical Semantics

10/12/2020
by   Ivan Vulić, et al.
0

The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture. While prior research focused on morphosyntactic, semantic, and world knowledge, it remains unclear to which extent LMs also derive lexical type-level knowledge from words in context. In this work, we present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks, addressing the following questions: 1) How do different lexical knowledge extraction strategies (monolingual versus multilingual source LM, out-of-context versus in-context encoding, inclusion of special tokens, and layer-wise averaging) impact performance? How consistent are the observed effects across tasks and languages? 2) Is lexical knowledge stored in few parameters, or is it scattered throughout the network? 3) How do these representations fare against traditional static word vectors in lexical tasks? 4) Does the lexical information emerging from independently trained monolingual LMs display latent similarities? Our main results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks. Moreover, we validate the claim that lower Transformer layers carry more type-level lexical knowledge, but also show that this knowledge is distributed across multiple layers.

READ FULL TEXT

page 7

page 19

research
04/29/2021

Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses

Pre-trained language models (LMs) encode rich information about linguist...
research
08/01/2022

BabelBERT: Massively Multilingual Transformers Meet a Massively Multilingual Lexical Resource

While pretrained language models (PLMs) primarily serve as general purpo...
research
06/27/2023

Can Pretrained Language Models Derive Correct Semantics from Corrupt Subwords under Noise?

For Pretrained Language Models (PLMs), their susceptibility to noise has...
research
04/25/2020

Semi-Lexical Languages – A Formal Basis for Unifying Machine Learning and Symbolic Reasoning in Computer Vision

Human vision is able to compensate imperfections in sensory inputs from ...
research
01/13/2020

Dialectal Layers in West Iranian: a Hierarchical Dirichlet Process Approach to Linguistic Relationships

This paper addresses a series of complex and unresolved issues in the hi...
research
09/17/2020

Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA

Many NLP tasks have benefited from transferring knowledge from contextua...
research
02/10/2021

Language Models for Lexical Inference in Context

Lexical inference in context (LIiC) is the task of recognizing textual e...

Please sign up or login with your details

Forgot password? Click here to reset