Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources

by   Taolin Zhang, et al.

Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.


page 1

page 2

page 3

page 4

page 6

page 8

page 9

page 10


Multi-Task Learning for Machine Reading Comprehension

We propose a multi-task learning framework to jointly train a Machine Re...

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

In this paper, we introduce DuReader, a new large-scale, open-domain Chi...

Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension

Machine reading comprehension (MRC) is the task that asks a machine to a...

CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension

We present a Chinese judicial reading comprehension (CJRC) dataset which...

A multi-perspective combined recall and rank framework for Chinese procedure terminology normalization

Medical terminology normalization aims to map the clinical mention to te...

The CALLA Dataset: Probing LLMs' Interactive Knowledge Acquisition from Chinese Medical Literature

The application of Large Language Models (LLMs) to the medical domain ha...

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

In this paper we study yes/no questions that are naturally occurring ---...

Please sign up or login with your details

Forgot password? Click here to reset