Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History

02/25/2022
by   Aurélien Max, et al.
0

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic processes on text. When available in large quantities, they also prove interesting material for linguistic studies. In this article, we present a new resource built from Wikipedia's revision history, called WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), which contains numerous editings by human contributors, including various corrections and rewritings. We discuss the main motivations for building such a resource, describe how it was built and present initial applications on French.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2018

Learning To Split and Rephrase From Wikipedia Edit History

Split and rephrase is the task of breaking down a sentence into shorter ...
research
01/14/2017

Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History

Much of work in semantic web relying on Wikipedia as the main source of ...
research
08/05/2020

Computational linguistic assessment of textbook and online learning media by means of threshold concepts in business education

Threshold concepts are key terms in domain-based knowledge acquisition. ...
research
01/27/2021

Mining Large-Scale Low-Resource Pronunciation Data From Wikipedia

Pronunciation modeling is a key task for building speech technology in n...
research
05/10/2023

WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia

Wikipedia can be edited by anyone and thus contains various quality sent...
research
01/28/2020

WikiHist.html: English Wikipedia's Full Revision History in HTML Format

Wikipedia is written in the wikitext markup language. When serving conte...
research
05/02/2020

GenericsKB: A Knowledge Base of Generic Statements

We present a new resource for the NLP community, namely a large (3.5M+ s...

Please sign up or login with your details

Forgot password? Click here to reset