Evaluation of Summarization Systems across Gender, Age, and Race

10/08/2021
by   Anna Jørgensen, et al.
0

Summarization systems are ultimately evaluated by human annotators and raters. Usually, annotators and raters do not reflect the demographics of end users, but are recruited through student populations or crowdsourcing platforms with skewed demographics. For two different evaluation scenarios – evaluation against gold summaries and system output ratings – we show that summary evaluation is sensitive to protected attributes. This can severely bias system development and evaluation, leading us to build models that cater for some groups rather than others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2016

Revisiting Summarization Evaluation for Scientific Articles

Evaluation of text summarization approaches have been mostly based on me...
research
09/19/2021

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries

Current pre-trained models applied to summarization are prone to factual...
research
09/21/2021

Evaluating Debiasing Techniques for Intersectional Biases

Bias is pervasive in NLP models, motivating the development of automatic...
research
07/15/2020

Dialect Diversity in Text Summarization on Twitter

Extractive summarization algorithms can be used on Twitter data to retur...
research
06/18/2021

Subjective Bias in Abstractive Summarization

Due to the subjectivity of the summarization, it is a good practice to h...
research
10/22/2018

Fairness-Preserving Text Summarzation

As the amount of textual information grows rapidly, text summarization a...
research
09/02/2019

SumQE: a BERT-based Summary Quality Estimation Model

We propose SumQE, a novel Quality Estimation model for summarization bas...

Please sign up or login with your details

Forgot password? Click here to reset