Russian word sense induction by clustering averaged word embeddings

05/06/2018
by   Andrey Kutuzov, et al.
0

The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018). Our team was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th for the bts-rnc and active-dict datasets (containing mostly polysemous words) among all 19 participants. The method we employed was extremely naive. It implied representing contexts of ambiguous words as averaged word embedding vectors, using off-the-shelf pre-trained distributional models. Then, these vector representations were clustered with mainstream clustering techniques, thus producing the groups corresponding to the ambiguous word senses. As a side result, we show that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data - not only in intrinsic evaluation, but also in downstream tasks like word sense induction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2018

How much does a word weigh? Weighting word embeddings for word sense induction

The paper describes our participation in the first shared task on word s...
research
04/20/2023

Word Sense Induction with Knowledge Distillation from BERT

Pre-trained contextual language models are ubiquitously employed for lan...
research
06/17/2016

Sense Embedding Learning for Word Sense Induction

Conventional word sense induction (WSI) methods usually represent each i...
research
08/26/2018

Word Sense Induction with Neural biLM and Symmetric Patterns

An established method for Word Sense Induction (WSI) uses a language mod...
research
03/22/2018

Word sense induction using word embeddings and community detection in complex networks

Word Sense Induction (WSI) is the ability to automatically induce word s...
research
10/28/2019

Cross-Domain Ambiguity Detection using Linear Transformation of Word Embedding Spaces

The requirements engineering process is a crucial stage of the software ...
research
04/17/2021

Are Word Embedding Methods Stable and Should We Care About It?

A representation learning method is considered stable if it consistently...

Please sign up or login with your details

Forgot password? Click here to reset