Dialect Diversity in Text Summarization on Twitter

07/15/2020
by   L. Elisa Celis, et al.
0

Extractive summarization algorithms can be used on Twitter data to return a set of posts that succinctly capture a topic. However, Twitter datasets have a significant fraction of posts written in different English dialects. We study the dialect bias in the summaries of such datasets generated by common summarization algorithms and observe that, for datasets that have sentences from more than one dialect, most summarization algorithms return summaries that under-represent the minority dialect. To correct for this bias, we propose a framework that takes an existing summarization algorithm as a blackbox and, using a small set of dialect-diverse sentences, returns a summary that is relatively more dialect-diverse. Crucially, our approach does not need the sentences in the dataset to have dialect labels, ensuring that the diversification process is independent of dialect classification and language identification models. We show the efficacy of our approach on Twitter datasets containing posts written in dialects used by different social groups defined by race, region or gender; in all cases, our approach leads to improved dialect diversity compared to the standard summarization approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/28/2018

The Rule of Three: Abstractive Text Summarization in Three Bullet Points

Neural network-based approaches have become widespread for abstractive t...
research
01/29/2019

Implicit Diversity in Image Summarization

Case studies, such as Kay et al., 2015 have shown that in image summariz...
research
10/22/2018

Fairness-Preserving Text Summarzation

As the amount of textual information grows rapidly, text summarization a...
research
10/22/2018

Beyond ROUGE Scores in Algorithmic Summarization: Creating Fairness-Preserving Textual Summaries

As the amount of textual information grows rapidly, text summarization a...
research
11/02/2018

Abstractive Summarization of Reddit Posts with Multi-level Memory Networks

We address the problem of abstractive summarization in two directions: p...
research
10/08/2021

Evaluation of Summarization Systems across Gender, Age, and Race

Summarization systems are ultimately evaluated by human annotators and r...
research
02/12/2018

Fair and Diverse DPP-based Data Summarization

Sampling methods that choose a subset of the data proportional to its di...

Please sign up or login with your details

Forgot password? Click here to reset