De-identification of medical records using conditional random fields and long short-term memory networks

09/20/2017
by   Zhipeng Jiang, et al.
0

The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F_1 measure of 89.86 system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2023

Deep Long-Short Term Memory networks: Stability properties and Experimental validation

The aim of this work is to investigate the use of Incrementally Input-to...
research
06/05/2020

SEAL: Scientific Keyphrase Extraction and Classification

Automatic scientific keyphrase extraction is a challenging problem facil...
research
01/11/2017

De-identification In practice

We report our effort to identify the sensitive information, subset of da...
research
09/15/2021

Scope resolution of predicted negation cues: A two-step neural network-based approach

Neural network-based methods are the state of the art in negation scope ...
research
11/27/2018

Document classification using a Bi-LSTM to unclog Brazil's supreme court

The Brazilian court system is currently the most clogged up judiciary sy...
research
12/07/2017

Effective Neural Solution for Multi-Criteria Word Segmentation

We present a simple yet elegant solution to train a single joint model o...
research
01/30/2020

An Efficient Architecture for Predicting the Case of Characters using Sequence Models

The dearth of clean textual data often acts as a bottleneck in several n...

Please sign up or login with your details

Forgot password? Click here to reset