'Tis but Thy Name: Semantic Question Answering Evaluation with 11M Names for 1M Entities

02/28/2022
by   Albert Huang, et al.
0

Classic lexical-matching-based QA metrics are slowly being phased out because they punish succinct or informative outputs just because those answers were not provided as ground truth. Recently proposed neural metrics can evaluate semantic similarity but were trained on small textual similarity datasets grafted from foreign domains. We introduce the Wiki Entity Similarity (WES) dataset, an 11M example, domain targeted, semantic entity similarity dataset that is generated from link texts in Wikipedia. WES is tailored to QA evaluation: the examples are entities and phrases and grouped into semantic clusters to simulate multiple ground-truth labels. Human annotators consistently agree with WES labels, and a basic cross encoder metric is better than four classic metrics at predicting human judgments of correctness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2021

Semantic Answer Similarity for Evaluating Question Answering Models

The evaluation of question answering models compares ground-truth annota...
research
05/01/2020

KPQA: A Metric for Generative Question Answering Using Word Weights

For the automatic evaluation of Generative Question Answering (genQA) sy...
research
06/25/2022

Evaluation of Semantic Answer Similarity Metrics

There are several issues with the existing general machine translation o...
research
05/29/2023

Datasets for Portuguese Legal Semantic Textual Similarity: Comparing weak supervision and an annotation process approaches

The Brazilian judiciary has a large workload, resulting in a long time t...
research
04/14/2021

Evaluation of Unsupervised Entity and Event Salience Estimation

Salience Estimation aims to predict term importance in documents. Due to...
research
01/14/2022

Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

In this paper, we introduce an important yet relatively unexplored NLP t...
research
09/09/2023

FaNS: a Facet-based Narrative Similarity Metric

Similar Narrative Retrieval is a crucial task since narratives are essen...

Please sign up or login with your details

Forgot password? Click here to reset