Guidelines and a Corpus for Extracting Biographical Events

06/07/2022
by   Marco Antonio Stranisci, et al.
0

Despite biographies are widely spread within the Semantic Web, resources and approaches to automatically extract biographical events are limited. Such limitation reduces the amount of structured, machine-readable biographical information, especially about people belonging to underrepresented groups. Our work challenges this limitation by providing a set of guidelines for the semantic annotation of life events. The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (ISO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an annotation task of Wikipedia biographies of underrepresented writers, namely authors born in non-Western countries, migrants, or belonging to ethnic minorities. 1,000 sentences were annotated by 4 annotators with an average Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on OntoNotes. Such mapping allowed to to expand our corpus, showing that already existing resources may be exploited for the biographical event extraction task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2021

Annotation Guidelines for the Turku Paraphrase Corpus

This document describes the annotation guidelines used to construct the ...
research
02/28/2020

Automatic Section Recognition in Obituaries

Obituaries contain information about people's values across times and cu...
research
06/15/2023

Wikibio: a Semantic Resource for the Intersectional Analysis of Biographical Events

Biographical event detection is a relevant task for the exploration and ...
research
06/03/2021

Men Are Elected, Women Are Married: Events Gender Bias on Wikipedia

Human activities can be seen as sequences of events, which are crucial t...
research
11/27/2019

NorNE: Annotating Named Entities for Norwegian

This paper presents NorNE, a manually annotated corpus of named entities...
research
04/07/2020

A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products

Recognizing non-standard entity types and relations, such as B2B product...
research
11/28/2016

Developing a cardiovascular disease risk factor annotated corpus of Chinese electronic medical records

Cardiovascular disease (CVD) has become the leading cause of death in Ch...

Please sign up or login with your details

Forgot password? Click here to reset