Texts in, meaning out: neural language models in semantic similarity task for Russian

04/30/2015
by   Andrey Kutuzov, et al.
0

Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.

READ FULL TEXT
research
04/18/2021

Constrained Language Models Yield Few-Shot Semantic Parsers

We explore the use of large pretrained language models as few-shot seman...
research
04/11/2023

Mathematical and Linguistic Characterization of Orhan Pamuk's Nobel Works

In this study, Nobel Laureate Orhan Pamuk's works are chosen as examples...
research
02/28/2016

Gibberish Semantics: How Good is Russian Twitter in Word Semantic Similarity Task?

The most studied and most successful language models were developed and ...
research
03/15/2018

RUSSE: The First Workshop on Russian Semantic Similarity

The paper gives an overview of the Russian Semantic Similarity Evaluatio...
research
06/30/2023

A Massive Scale Semantic Similarity Dataset of Historical English

A diversity of tasks use language models trained on semantic similarity ...
research
01/19/2018

Size vs. Structure in Training Corpora for Word Embedding Models: Araneum Russicum Maximum and Russian National Corpus

In this paper, we present a distributional word embedding model trained ...
research
09/24/2018

Text Similarity in Vector Space Models: A Comparative Study

Automatic measurement of semantic text similarity is an important task i...

Please sign up or login with your details

Forgot password? Click here to reset