Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling

08/01/2015
by   Ira Leviant, et al.
0

A common evaluation practice in the vector space models (VSMs) literature is to measure the models' ability to predict human judgments about lexical semantic relations between word pairs. Most existing evaluation sets, however, consist of scores collected for English word pairs only, ignoring the potential impact of the judgment language in which word pairs are presented on the human scores. In this paper we translate two prominent evaluation sets, wordsim353 (association) and SimLex999 (similarity), from English to Italian, German and Russian and collect scores for each dataset from crowdworkers fluent in its language. Our analysis reveals that human judgments are strongly impacted by the judgment language. Moreover, we show that the predictions of monolingual VSMs do not necessarily best correlate with human judgments made with the language used for model training, suggesting that models and humans are affected differently by the language they use when making semantic judgments. Finally, we show that in a large number of setups, multilingual VSM combination results in improved correlations with human judgments, suggesting that multilingualism may partially compensate for the judgment language effect on human judgments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2016

Issues in evaluating semantic spaces using word analogies

The offset method for solving word analogies has become a standard evalu...
research
09/10/2018

Neural Latent Relational Analysis to Capture Lexical Semantic Relations in a Vector Space

Capturing the semantic relations of words in a vector space contributes ...
research
05/12/2022

SimRelUz: Similarity and Relatedness scores as a Semantic Evaluation dataset for Uzbek language

Semantic relatedness between words is one of the core concepts in natura...
research
10/21/2018

BCWS: Bilingual Contextual Word Similarity

This paper introduces the first dataset for evaluating English-Chinese B...
research
05/23/2023

Exploring Representational Disparities Between Multilingual and Bilingual Translation Models

Multilingual machine translation has proven immensely useful for low-res...
research
04/15/2018

Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

We present two novel datasets for the low-resource language Vietnamese t...
research
06/16/2023

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

This paper investigates the use of word surprisal, a measure of the pred...

Please sign up or login with your details

Forgot password? Click here to reset