Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

10/28/2016
by   Pavlina Fragkou, et al.
0

In this paper we examine the benefit of performing named entity recognition (NER) and co-reference resolution to an English and a Greek corpus used for text segmentation. The aim here is to examine whether the combination of text segmentation and information extraction can be beneficial for the identification of the various topics that appear in a document. NER was performed manually in the English corpus and was compared with the output produced by publicly available annotation tools while, an already existing tool was used for the Greek corpus. Produced annotations from both corpora were manually corrected and enriched to cover four types of named entities. Co-reference resolution i.e., substitution of every reference of the same instance with the same named entity identifier was subsequently performed. The evaluation, using five text segmentation algorithms for the English corpus and four for the Greek corpus leads to the conclusion that, the benefit highly depends on the segment's topic, the number of named entity instances appearing in it, as well as the segment's length.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

UNER: Universal Named-Entity RecognitionFramework

We introduce the Universal Named-Entity Recognition (UNER)framework, a 4...
research
12/19/2022

E-NER – An Annotated Named Entity Recognition Corpus of Legal Text

Identifying named entities such as a person, location or organization, i...
research
08/08/2020

Assessing Demographic Bias in Named Entity Recognition

Named Entity Recognition (NER) is often the first step towards automated...
research
10/29/2020

RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain

We show-case an application of information extraction methods, such as n...
research
11/19/2022

AiCEF: An AI-assisted Cyber Exercise Content Generation Framework Using Named Entity Recognition

Content generation that is both relevant and up to date with the current...
research
04/08/2022

CyNER: A Python Library for Cybersecurity Named Entity Recognition

Open Cyber threat intelligence (OpenCTI) information is available in an ...
research
08/09/2019

Generating Information Extraction Patterns from Overlapping and Variable Length Annotations using Sequence Alignment

Sequence alignments are used to capture patterns composed of elements re...

Please sign up or login with your details

Forgot password? Click here to reset