Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies

04/12/2022
by   Angela Fan, et al.
0

Generating factual, long-form text such as Wikipedia articles raises three key challenges: how to gather relevant evidence, how to structure information into well-formed text, and how to ensure that the generated text is factually correct. We address these by developing a model for English text that uses a retrieval mechanism to identify relevant supporting information on the web and a cache-based pre-trained encoder-decoder to generate long-form biographies section by section, including citation information. To assess the impact of available web evidence on the output text, we compare the performance of our approach when generating biographies about women (for which less information is available on the web) vs. biographies generally. To this end, we curate a dataset of 1,500 biographies about women. We analyze our generated text to understand how differences in available web evidence data affect generation. We evaluate the factuality, fluency, and quality of the generated texts using automatic metrics and human evaluation. We hope that these techniques can be used as a starting point for human writers, to aid in reducing the complexity inherent in the creation of long-form, factual text.

READ FULL TEXT
research
01/30/2018

Generating Wikipedia by Summarizing Long Sequences

We show that generating English Wikipedia articles can be approached as ...
research
05/22/2020

A Generative Approach to Titling and Clustering Wikipedia Sections

We evaluate the performance of transformer encoders with various decoder...
research
04/10/2023

WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus

In this paper, we introduce a new NLP task – generating short factual ar...
research
11/02/2019

Human and Automatic Detection of Generated Text

With the advent of generative models with a billion parameters or more, ...
research
02/25/2023

Abstractive Text Summarization using Attentive GRU based Encoder-Decoder

In todays era huge volume of information exists everywhere. Therefore, i...
research
04/15/2019

Improving Human Text Comprehension through Semi-Markov CRF-based Neural Section Title Generation

Titles of short sections within long documents support readers by guidin...
research
05/24/2023

Enabling Large Language Models to Generate Text with Citations

Large language models (LLMs) have emerged as a widely-used tool for info...

Please sign up or login with your details

Forgot password? Click here to reset