Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

11/11/2016
by   Oded Avraham, et al.
0

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2022

Comparing in context: Improving cosine similarity measures with a metric tensor

Cosine similarity is a widely used measure of the relatedness of pre-tra...
research
05/08/2016

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Lacking standardized extrinsic evaluation methods for vector representat...
research
05/02/2023

Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement

Reliable application of machine learning is of primary importance to the...
research
12/31/2022

Approaching Peak Ground Truth

Machine learning models are typically evaluated by computing similarity ...
research
09/16/2014

DISA at ImageCLEF 2014 Revised: Search-based Image Annotation with DeCAF Features

This paper constitutes an extension to the report on DISA-MU team partic...
research
08/19/2021

Czech News Dataset for Semantic Textual Similarity

This paper describes a novel dataset consisting of sentences with semant...
research
06/25/2019

Model-based annotation of coreference

Humans do not make inferences over texts, but over models of what texts ...

Please sign up or login with your details

Forgot password? Click here to reset