Rethinking and Refining the Distinct Metric

02/28/2022
by   Siyang Liu, et al.
0

Distinct is a widely used automatic metric for evaluating the diversity of language generation tasks. However, we observe that the original approach to calculating distinct scores has evident biases that tend to add higher penalties to longer sequences. In this paper, we refine the calculation of distinct scores by re-scaling the number of distinct tokens based on its expectation. We provide both empirical and theoretical evidence to show that our method effectively removes the biases exhibited in the original distinct score. Further analyses also demonstrate that the refined score correlates better with human evaluations.

READ FULL TEXT
research
11/23/2020

Unsupervised Difficulty Estimation with Action Scores

Evaluating difficulty and biases in machine learning models has become o...
research
09/03/2019

The Woman Worked as a Babysitter: On Biases in Language Generation

We present a systematic study of biases in natural language generation (...
research
05/17/2023

A Better Way to Do Masked Language Model Scoring

Estimating the log-likelihood of a given sentence under an autoregressiv...
research
12/08/2022

SpeechLMScore: Evaluating speech generation using speech language model

While human evaluation is the most reliable metric for evaluating speech...
research
03/28/2022

The SAME score: Improved cosine based bias score for word embeddings

Over the last years, word and sentence embeddings have established as te...
research
08/27/2021

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Since the introduction of the transformer model by Vaswani et al. (2017)...
research
11/30/2018

Do the technical universities exhibit distinct behaviour in global university rankings? A Times Higher Education (THE) case study

Technical Universities (TUs) exhibit a distinct ranking performance in c...

Please sign up or login with your details

Forgot password? Click here to reset