A Unified Framework of Medical Information Annotation and Extraction for Chinese Clinical Text

03/08/2022
by   Enwei Zhu, et al.
0

Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques and thus require massive annotated linguistic data. This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction, which are unified in annotation, modeling and evaluation. Specifically, the annotation scheme is comprehensive, and compatible between tasks, especially for the medical relations. The resulted annotated corpus includes 1,200 full medical records (or 18,039 broken-down documents), and achieves inter-annotator agreements (IAAs) of 94.53 Three task-specific neural network models are developed within a shared structure, and enhanced by SOTA NLP techniques, i.e., pre-trained language models. Experimental results show that the system can retrieve medical entities, relations and attributes with F 1 scores of 93.47 90.89 annotation scheme and code, provides solid and practical engineering experience of developing an integrated medical information extraction system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2016

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts

Objective: To build a comprehensive corpus covering syntactic and semant...
research
01/27/2020

SemClinBr – a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks

The high volume of research focusing on extracting patient's information...
research
11/13/2018

Few-shot Learning for Named Entity Recognition in Medical Text

Deep neural network models have recently achieved state-of-the-art perfo...
research
04/27/2022

CREER: A Large-Scale Corpus for Relation Extraction and Entity Recognition

We describe the design and use of the CREER dataset, a large corpus anno...
research
11/08/2021

JaMIE: A Pipeline Japanese Medical Information Extraction System

We present an open-access natural language processing toolkit for Japane...
research
11/15/2018

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

This paper presents a Lisp architecture for a portable NLP system, terme...
research
02/09/2023

Lightweight Transformers for Clinical Natural Language Processing

Specialised pre-trained language models are becoming more frequent in NL...

Please sign up or login with your details

Forgot password? Click here to reset