RuDSI: graph-based word sense induction dataset for Russian

09/28/2022
by   Anna Aksenova, et al.
0

We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). Unlike prior WSI datasets for Russian, RuDSI is completely data-driven (based on texts from Russian National Corpus), with no external word senses imposed on annotators. Depending on the parameters of graph clustering, different derivative datasets can be produced from raw annotation. We report the performance that several baseline WSI methods obtain on RuDSI and discuss possibilities for improving these scores.

READ FULL TEXT
research
10/14/2021

Large Scale Substitution-based Word Sense Induction

We present a word-sense induction method based on pre-trained masked lan...
research
02/28/2013

KSU KDD: Word Sense Induction by Clustering in Topic Space

We describe our language-independent unsupervised word sense induction s...
research
04/24/2017

Watset: Automatic Induction of Synsets from a Graph of Synonyms

This paper presents a new graph-based approach that induces synsets usin...
research
03/15/2018

RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language

The paper describes the results of the first shared task on word sense i...
research
04/17/2021

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Word meaning is notoriously difficult to capture, both synchronically an...
research
04/09/2018

Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Word sense induction (WSI), which addresses polysemy by unsupervised dis...
research
08/26/2018

Word Sense Induction with Neural biLM and Symmetric Patterns

An established method for Word Sense Induction (WSI) uses a language mod...

Please sign up or login with your details

Forgot password? Click here to reset