The ApposCorpus: A new multilingual, multi-domain dataset for factual appositive generation

11/06/2020
by   Yova Kementchedjhieva, et al.
0

News articles, image captions, product reviews and many other texts mention people and organizations whose name recognition could vary for different audiences. In such cases, background information about the named entities could be provided in the form of an appositive noun phrase, either written by a human or generated automatically. We expand on the previous work in appositive generation with a new, more realistic, end-to-end definition of the task, instantiated by a dataset that spans four languages (English, Spanish, German and Polish), two entity types (person and organization) and two domains (Wikipedia and News). We carry out an extensive analysis of the data and the task, pointing to the various modeling challenges it poses. The results we obtain with standard language generation methods show that the task is indeed non-trivial, and leaves plenty of room for improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2021

Show and Write: Entity-aware News Generation with Image Information

Automatically writing long articles is a complex and challenging languag...
research
04/05/2019

PoMo: Generating Entity-Specific Post-Modifiers in Context

We introduce entity post-modifier generation as an instance of a collabo...
research
04/17/2020

Batch Clustering for Multilingual News Streaming

Nowadays, digital news articles are widely available, published by vario...
research
03/13/2022

ProtagonistTagger – a Tool for Entity Linkage of Persons in Texts from Various Languages and Domains

Named entities recognition (NER) and disambiguation (NED) can add semant...
research
10/26/2018

Named Person Coreference in English News

People are often entities of interest in tasks such as search and inform...
research
09/17/2018

Similarity measure for Public Persons

For the webportal "Who is in the News!" with statistics about the appear...
research
04/14/2021

I Wish I Would Have Loved This One, But I Didn't – A Multilingual Dataset for Counterfactual Detection in Product Reviews

Counterfactual statements describe events that did not or cannot take pl...

Please sign up or login with your details

Forgot password? Click here to reset