RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain

10/29/2020
by   Vitaly Ivanin, et al.
6

We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

The advancement of biomedical named entity recognition (BNER) and biomed...
research
12/05/2022

Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Named Entity Recognition (NER) involves the identification and classific...
research
10/28/2016

Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

In this paper we examine the benefit of performing named entity recognit...
research
09/01/2022

KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks

As digitized traditional cultural heritage documents have rapidly increa...
research
01/31/2016

Numerical Atrribute Extraction from Clinical Texts

This paper describes about information extraction system, which is an ex...
research
04/07/2020

A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products

Recognizing non-standard entity types and relations, such as B2B product...

Please sign up or login with your details

Forgot password? Click here to reset