Active Learning for Chinese Word Segmentation in Medical Text

08/22/2019
by   Tingting Cai, et al.
0

Electronic health records (EHRs) stored in hospital information systems completely reflect the patients' diagnosis and treatment processes, which are essential to clinical data mining. Chinese word segmentation (CWS) is a fundamental and important task for Chinese natural language processing. Currently, most state-of-the-art CWS methods greatly depend on large-scale manually-annotated data, which is a very time-consuming and expensive work, specially for the annotation in medical field. In this paper, we present an active learning method for CWS in medical text. To effectively utilize complete segmentation history, a new scoring model in sampling strategy is proposed, which combines information entropy with neural network. Besides, to capture interactions between adjacent characters, K-means clustering features are additionally added in word segmenter. We experimentally evaluate our proposed CWS method in medical text, experimental results based on EHRs collected from the Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine show that our proposed method outperforms other reference methods, which can effectively save the cost of manual annotation.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/05/2020

Confident Coreset for Active Learning in Medical Image Analysis

Recent advances in deep learning have resulted in great successes in var...
07/16/2021

The Application of Active Query K-Means in Text Classification

Active learning is a state-of-art machine learning approach to deal with...
08/28/2020

Cost-Quality Adaptive Active Learning for Chinese Clinical Named Entity Recognition

Clinical Named Entity Recognition (CNER) aims to automatically identity ...
08/28/2019

Onto Word Segmentation of the Complete Tang Poems

We aim at segmenting words in the Complete Tang Poems (CTP). Although it...
07/18/2018

Active Learning for Segmentation by Optimizing Content Information for Maximal Entropy

Segmentation is essential for medical image analysis tasks such as inter...
02/05/2020

Multi-Fusion Chinese WordNet (MCW) : Compound of Machine Learning and Manual Correction

Princeton WordNet (PWN) is a lexicon-semantic network based on cognitive...
05/21/2019

A realistic and robust model for Chinese word segmentation

A realistic Chinese word segmentation tool must adapt to textual variati...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.