Tracing cultural diachronic semantic shifts in Russian using word embeddings: test sets and baselines

05/16/2019
by   Vadim Fomin, et al.
0

The paper introduces manually annotated test sets for the task of tracing diachronic (temporal) semantic shifts in Russian. The two test sets are complementary in that the first one covers comparatively strong semantic changes occurring to nouns and adjectives from pre-Soviet to Soviet times, while the second one covers comparatively subtle socially and culturally determined shifts occurring in years from 2000 to 2014. Additionally, the second test set offers more granular classification of shifts degree, but is limited to only adjectives. The introduction of the test sets allowed us to evaluate several well-established algorithms of semantic shifts detection (posing this as a classification problem), most of which have never been tested on Russian material. All of these algorithms use distributional word embedding models trained on the corresponding in-domain corpora. The resulting scores provide solid comparison baselines for future studies tackling similar tasks. We publish the datasets, code and the trained models in order to facilitate further research in automatically detecting temporal semantic shifts for Russian words, with time periods of different granularities.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

06/09/2018

Diachronic word embeddings and semantic shifts: a survey

Recent years have witnessed a surge of publications aimed at tracing tem...
11/22/2017

Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes

Word embeddings use vectors to represent words such that the geometry be...
02/15/2021

How COVID-19 Is Changing Our Language : Detecting Semantic Shift in Twitter Word Embeddings

Words are malleable objects, influenced by events that are reflected in ...
01/19/2018

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

In this paper, we present a distributional word embedding model trained ...
06/15/2021

Three-part diachronic semantic change dataset for Russian

We present a manually annotated lexical semantic change dataset for Russ...
11/15/2017

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

Recently, researchers started to pay attention to the detection of tempo...
11/01/2020

Semantic coordinates analysis reveals language changes in the AI field

Semantic shifts can reflect changes in beliefs across hundreds of years,...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.