Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

04/15/2018
by   Kim Anh Nguyen, et al.
0

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2022

SimRelUz: Similarity and Relatedness scores as a Semantic Evaluation dataset for Uzbek language

Semantic relatedness between words is one of the core concepts in natura...
research
03/23/2016

Evaluating semantic models with word-sentence relatedness

Semantic textual similarity (STS) systems are designed to encode and eva...
research
06/30/2023

A Massive Scale Semantic Similarity Dataset of Historical English

A diversity of tasks use language models trained on semantic similarity ...
research
03/31/2019

SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Embeddings Evaluation

There is a huge imbalance between languages currently spoken and corresp...
research
08/01/2015

Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling

A common evaluation practice in the vector space models (VSMs) literatur...
research
08/24/2018

Features of word similarity

In this theoretical note we compare different types of computational mod...
research
03/15/2018

RUSSE: The First Workshop on Russian Semantic Similarity

The paper gives an overview of the Russian Semantic Similarity Evaluatio...

Please sign up or login with your details

Forgot password? Click here to reset