Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings

03/23/2020
by   Christos Xypolopoulos, et al.
0

The number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources. We propose a novel method to estimate polysemy, based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven. We show through rigorous experiments that our rankings are well correlated (with strong statistical significance) with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings. A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word. Finally, the fully unsupervised nature of our method makes it applicable to any language. Code and data are publicly available at https://github.com/ksipos/polysemy-assessment.

READ FULL TEXT

page 7

page 9

research
09/05/2020

Bio-inspired Structure Identification in Language Embeddings

Word embeddings are a popular way to improve downstream performances in ...
research
10/24/2020

Neural Compound-Word (Sandhi) Generation and Splitting in Sanskrit Language

This paper describes neural network based approaches to the process of t...
research
02/02/2022

L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources

We present L3Cube-MahaCorpus a Marathi monolingual data set scraped from...
research
04/28/2020

Embarrassingly Simple Unsupervised Aspect Extraction

We present a simple but effective method for aspect identification in se...
research
04/08/2022

Contextual Representation Learning beyond Masked Language Modeling

How do masked language models (MLMs) such as BERT learn contextual repre...
research
10/05/2020

Pareto Probing: Trading Off Accuracy for Complexity

The question of how to probe contextual word representations in a way th...
research
08/22/2023

(Un)fair Exposure in Deep Face Rankings at a Distance

Law enforcement regularly faces the challenge of ranking suspects from t...

Please sign up or login with your details

Forgot password? Click here to reset