RuREBus: a Case Study of Joint Named Entity Recognition and Relation Extraction from e-Government Domain

by   Vitaly Ivanin, et al.

We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency. The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English. Unlike expectations, the state-of-the-art transformer-based models show modest performance for both tasks, either when approached sequentially, or in an end-to-end fashion. Our experiments have demonstrated that fine-tuning on a large unlabeled corpora does not automatically yield significant improvement and thus we may conclude that more sophisticated strategies of leveraging unlabelled texts are demanded. In this paper, we describe the whole developed pipeline, starting from text annotation, baseline development, and designing a shared task in hopes of improving the baseline. Eventually, we realize that the current NER and RE technologies are far from being mature and do not overcome so far challenges like ours.



There are no comments yet.


page 1

page 2

page 3

page 4


Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

The advancement of biomedical named entity recognition (BNER) and biomed...

Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts

In this paper we examine the benefit of performing named entity recognit...

Towards Effective Multi-Task Interaction for Entity-Relation Extraction: A Unified Framework with Selection Recurrent Network

Entity-relation extraction aims to jointly solve named entity recognitio...

A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products

Recognizing non-standard entity types and relations, such as B2B product...

Transformer-Based Approach for Joint Handwriting and Named Entity Recognition in Historical documents

The extraction of relevant information carried out by named entities in ...

Numerical Atrribute Extraction from Clinical Texts

This paper describes about information extraction system, which is an ex...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.