Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit

10/02/2020
by   Zeljko Kraljevic, et al.
0

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a feature-rich annotation interface for customizing and training IE models; and c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets ( F1 0.467-0.791 vs 0.384-0.691). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over  8.8B words from  17M clinical records and further fine-tuning with  6K clinician annotated examples. We show strong transferability ( F1 >0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.

READ FULL TEXT

page 1

page 5

page 15

page 36

page 38

research
09/04/2021

Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case

Contextualised word embeddings is a powerful tool to detect contextual s...
research
03/03/2020

Med7: a transferable clinical natural language processing model for electronic health records

The field of clinical natural language processing has been advanced sign...
research
12/18/2019

MedCAT – Medical Concept Annotation Tool

Biomedical documents such as Electronic Health Records (EHRs) contain a ...
research
10/01/2019

Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

Natural language processing techniques are being applied to increasingly...
research
11/08/2021

JaMIE: A Pipeline Japanese Medical Information Extraction System

We present an open-access natural language processing toolkit for Japane...
research
09/18/2018

Lung Cancer Concept Annotation from Spanish Clinical Narratives

Recent rapid increase in the generation of clinical data and rapid devel...

Please sign up or login with your details

Forgot password? Click here to reset