An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets

04/30/2021
by   Alexander Sboev, et al.
0

We present the full-size Russian complexly NER-labeled corpus of Internet user reviews, along with an evaluation of accuracy levels reached on this corpus by a set of advanced deep learning neural networks to extract the pharmacologically meaningful entities from Russian texts. The corpus annotation includes mentions of the following entities: Medication (33005 mentions), Adverse Drug Reaction (1778), Disease (17403), and Note (4490). Two of them - Medication and Disease - comprise a set of attributes. A part of the corpus has the coreference annotation with 1560 coreference chains in 300 documents. Special multi-label model based on a language model and the set of features is developed, appropriate for presented corpus labeling. The influence of the choice of different modifications of the models: word vector representations, types of language models pre-trained for Russian, text normalization styles, and other preliminary processing are analyzed. The sufficient size of our corpus allows to study the effects of particularities of corpus labeling and balancing entities in the corpus. As a result, the state of the art for the pharmacological entity extraction problem for Russian is established on a full-size labeled corpus. In case of the adverse drug reaction (ADR) recognition, it is 61.1 by the F1-exact metric that, as our analysis shows, is on par with the accuracy level for other language corpora with similar characteristics and the ADR representativnes. The evaluated baseline precision of coreference relation extraction on the corpus is 71, that is higher the results reached on other Russian corpora.

READ FULL TEXT

page 11

page 12

research
04/07/2020

The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews

The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated c...
research
04/21/2021

Extracting Adverse Drug Events from Clinical Notes

Adverse drug events (ADEs) are unexpected incidents caused by the admini...
research
01/29/2019

Revised JNLPBA Corpus: A Revised Version of Biomedical NER Corpus for Relation Extraction Task

The advancement of biomedical named entity recognition (BNER) and biomed...
research
04/25/2019

Terminologies augmented recurrent neural network model for clinical named entity recognition

We aimed to enhance the performance of a supervised model for clinical n...
research
05/26/2023

People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts

Although pre-trained named entity recognition (NER) models are highly ac...
research
05/13/2019

Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model

Computational chemistry develops fast in recent years due to the rapid g...
research
07/25/2023

Embedding Models for Supervised Automatic Extraction and Classification of Named Entities in Scientific Acknowledgements

Acknowledgments in scientific papers may give an insight into aspects of...

Please sign up or login with your details

Forgot password? Click here to reset