A Proposal for Linguistic Similarity Datasets Based on Commonality Lists

05/15/2016
by   Dmitrijs Milajevs, et al.
0

Similarity is a core notion that is used in psychology and two branches of linguistics: theoretical and computational. The similarity datasets that come from the two fields differ in design: psychological datasets are focused around a certain topic such as fruit names, while linguistic datasets contain words from various categories. The later makes humans assign low similarity scores to the words that have nothing in common and to the words that have contrast in meaning, making similarity scores ambiguous. In this work we discuss the similarity collection procedure for a multi-category dataset that avoids score ambiguity and suggest changes to the evaluation procedure to reflect the insights of psychological literature for word, phrase and sentence similarity. We suggest to ask humans to provide a list of commonalities and differences instead of numerical similarity scores and employ the structure of human judgements beyond pairwise similarity for model evaluation. We believe that the proposed approach will give rise to datasets that test meaning representation models more thoroughly with respect to the human treatment of similarity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2022

SimRelUz: Similarity and Relatedness scores as a Semantic Evaluation dataset for Uzbek language

Semantic relatedness between words is one of the core concepts in natura...
research
09/27/2021

Patterns of Lexical Ambiguity in Contextualised Language Models

One of the central aspects of contextualised language models is that the...
research
09/04/2017

Learning Neural Word Salience Scores

Measuring the salience of a word is an essential step in numerous NLP ta...
research
03/17/2017

Construction of a Japanese Word Similarity Dataset

An evaluation of distributed word representation is generally conducted ...
research
03/10/2017

A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

Compositionality in language refers to how much the meaning of some phra...
research
11/21/2020

Sensing Ambiguity in Henry James' "The Turn of the Screw"

Fields such as the philosophy of language, continental philosophy, and l...
research
02/16/2018

Measuring Human-perceived Similarity in Heterogeneous Collections

We present a technique for estimating the similarity between objects suc...

Please sign up or login with your details

Forgot password? Click here to reset