De-identification of Unstructured Clinical Texts from Sequence to Sequence Perspective

by   Md. Monowar Anjum, et al.
Texas Medical Center
University of Manitoba

In this work, we propose a novel problem formulation for de-identification of unstructured clinical text. We formulate the de-identification problem as a sequence to sequence learning problem instead of a token classification problem. Our approach is inspired by the recent state-of -the-art performance of sequence to sequence learning models for named entity recognition. Early experimentation of our proposed approach achieved 98.91 dataset. This performance is comparable to current state-of-the-art models for unstructured clinical text de-identification.


page 1

page 2

page 3


Multilingual Sequence-to-Sequence Models for Hebrew NLP

Recent work attributes progress in NLP to large language models (LMs) wi...

CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models

Copy mechanisms are employed in sequence to sequence models (seq2seq) to...

Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution

Automatic action identification from video and kinematic data is an impo...

TTTTTackling WinoGrande Schemas

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrand...

3D Convolutional Sequence to Sequence Model for Vertebral Compression Fractures Identification in CT

An osteoporosis-related fracture occurs every three seconds worldwide, a...

Structured Multi-Label Biomedical Text Tagging via Attentive Neural Tree Decoding

We propose a model for tagging unstructured texts with an arbitrary numb...

Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing

Sequence-to-Sequence (S2S) models have achieved remarkable success on va...

Please sign up or login with your details

Forgot password? Click here to reset