Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space

04/27/2023
by   Filip Klubička, et al.
0

The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings, using a structural probing method. We repurpose an existing English verbal multi-word expression (MWE) dataset to suit the probing framework and perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings. Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm, leaving this an open question. We also identify some limitations of the used dataset and highlight important directions for future work in improving its suitability for a probing analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Probing with Noise: Unpicking the Warp and Weft of Embeddings

Improving our understanding of how information is encoded in vector spac...
research
01/25/2023

Probing Taxonomic and Thematic Embeddings for Taxonomic Information

Modelling taxonomic and thematic relatedness is important for building A...
research
09/11/2022

Probing for Understanding of English Verb Classes and Alternations in Large Pre-trained Language Models

We investigate the extent to which verb alternation classes, as describe...
research
01/07/2021

Homonym Identification using BERT – Using a Clustering Approach

Homonym identification is important for WSD that require coarse-grained ...
research
04/14/2021

Static Embeddings as Efficient Knowledge Bases?

Recent research investigates factual knowledge stored in large pretraine...
research
05/04/2020

Spying on your neighbors: Fine-grained probing of contextual embeddings for information about surrounding words

Although models using contextual word embeddings have achieved state-of-...
research
03/24/2020

Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!

A large number of embeddings trained on medical data have emerged, but i...

Please sign up or login with your details

Forgot password? Click here to reset