Can Eye Movement Data Be Used As Ground Truth For Word Embeddings Evaluation?

04/23/2018
by   Amir Bakarov, et al.
0

In recent years a certain success in the task of modeling lexical semantics was obtained with distributional semantic models. Nevertheless, the scientific community is still unaware what is the most reliable evaluation method for these models. Some researchers argue that the only possible gold standard could be obtained from neuro-cognitive resources that store information about human cognition. One of such resources is eye movement data on silent reading. The goal of this work is to test the hypothesis of whether such data could be used to evaluate distributional semantic models on different languages. We propose experiments with English and Russian eye movement datasets (Provo Corpus, GECO and Russian Sentence Corpus), word vectors (Skip-Gram models trained on national corpora and Web corpora) and word similarity datasets of Russian and English assessed by humans in order to find the existence of correlation between embeddings and eye movement data and test the hypothesis that this correlation is language independent. As a result, we found that the validity of the hypothesis being tested could be questioned.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2017

Evaluation of Croatian Word Embeddings

Croatian is poorly resourced and highly inflected language from Slavic l...
research
06/06/2018

The Limitations of Cross-language Word Embeddings Evaluation

The aim of this work is to explore the possible limitations of existing ...
research
01/30/2022

Recognition of Implicit Geographic Movement in Text

Analyzing the geographic movement of humans, animals, and other phenomen...
research
04/18/2016

Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: Word Embeddings and Semantic Fingerprints

We present our experience in applying distributional semantics (neural w...
research
03/15/2018

RUSSE: The First Workshop on Russian Semantic Similarity

The paper gives an overview of the Russian Semantic Similarity Evaluatio...
research
07/12/2017

A Critique of a Critique of Word Similarity Datasets: Sanity Check or Unnecessary Confusion?

Critical evaluation of word similarity datasets is very important for co...
research
10/04/2017

Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl

We present DepCC, the largest to date linguistically analyzed corpus in ...

Please sign up or login with your details

Forgot password? Click here to reset