A Context-Enhanced De-identification System

02/17/2021
by   Kahyun Lee, et al.
0

Many modern entity recognition systems, including the current state-of-the-art de-identification systems, are based on bidirectional long short-term memory (biLSTM) units augmented by a conditional random field (CRF) sequence optimizer. These systems process the input sentence by sentence. This approach prevents the systems from capturing dependencies over sentence boundaries and makes accurate sentence boundary detection a prerequisite. Since sentence boundary detection can be problematic especially in clinical reports, where dependencies and co-references across sentence boundaries are abundant, these systems have clear limitations. In this study, we built a new system on the framework of one of the current state-of-the-art de-identification systems, NeuroNER, to overcome these limitations. This new system incorporates context embeddings through forward and backward n-grams without using sentence boundaries. Our context-enhanced de-identification (CEDI) system captures dependencies over sentence boundaries and bypasses the sentence boundary detection problem altogether. We enhanced this system with deep affix features and an attention mechanism to capture the pertinent parts of the input. The CEDI system outperforms NeuroNER on the 2006 i2b2 de-identification challenge dataset, the 2014 i2b2 shared task de-identification dataset, and the 2016 CEGS N-GRID de-identification dataset (p<0.01). All datasets comprise narrative clinical reports in English but contain different note types varying from discharge summaries to psychiatric notes. Enhancing CEDI with deep affix features and the attention mechanism further increased performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

Aspect Specific Opinion Expression Extraction using Attention based LSTM-CRF Network

Opinion phrase extraction is one of the key tasks in fine-grained sentim...
research
01/31/2023

Sentence Identification with BOS and EOS Label Combinations

The sentence is a fundamental unit in many NLP applications. Sentence se...
research
06/24/2019

Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation

Attention mechanisms have seen some success for natural language process...
research
12/17/2018

A Robust Deep Learning Approach for Automatic Seizure Detection

Detecting epileptic seizure through analysis of the electroencephalograp...
research
06/05/2020

Sentence Compression as Deletion with Contextual Embeddings

Sentence compression is the task of creating a shorter version of an inp...
research
01/30/2023

Neural-FEBI: Accurate Function Identification in Ethereum Virtual Machine Bytecode

Millions of smart contracts have been deployed onto the Ethereum platfor...
research
10/05/2020

An Ensemble Approach to Automatic Structuring of Radiology Reports

Automatic structuring of electronic medical records is of high demand fo...

Please sign up or login with your details

Forgot password? Click here to reset