An innovative solution for breast cancer textual big data analysis

12/06/2017
by   Nicolas Thiebaut, et al.
0

The digitalization of stored information in hospitals now allows for the exploitation of medical data in text format, as electronic health records (EHRs), initially gathered for other purposes than epidemiology. Manual search and analysis operations on such data become tedious. In recent years, the use of natural language processing (NLP) tools was highlighted to automatize the extraction of information contained in EHRs, structure it and perform statistical analysis on this structured information. The main difficulties with the existing approaches is the requirement of synonyms or ontology dictionaries, that are mostly available in English only and do not include local or custom notations. In this work, a team composed of oncologists as domain experts and data scientists develop a custom NLP-based system to process and structure textual clinical reports of patients suffering from breast cancer. The tool relies on the combination of standard text mining techniques and an advanced synonym detection method. It allows for a global analysis by retrieval of indicators such as medical history, tumor characteristics, therapeutic responses, recurrences and prognosis. The versatility of the method allows to obtain easily new indicators, thus opening up the way for retrospective studies with a substantial reduction of the amount of manual work. With no need for biomedical annotators or pre-defined ontologies, this language-agnostic method reached an good extraction accuracy for several concepts of interest, according to a comparison with a manually structured file, without requiring any existing corpus with local or new notations.

READ FULL TEXT

page 1

page 8

research
04/02/2019

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes

Objective: There is a lot of information about cancer in Electronic Heal...
research
12/06/2018

Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Healthcare Records

Objective: Electronic health records (EHR) represent a rich resource for...
research
08/28/2018

MedSTS: A Resource for Clinical Semantic Textual Similarity

The wide adoption of electronic health records (EHRs) has enabled a wide...
research
06/13/2018

Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer

Accurately identifying distant recurrences in breast cancer from the Ele...
research
07/27/2016

Mining Arguments from Cancer Documents Using Natural Language Processing and Ontologies

In the medical domain, the continuous stream of scientific research cont...
research
10/14/2021

BI-RADS BERT Using Section Tokenization to Understand Radiology Reports

Radiology reports are the main form of communication between radiologist...
research
07/06/2020

Labeling of Multilingual Breast MRI Reports

Medical reports are an essential medium in recording a patient's conditi...

Please sign up or login with your details

Forgot password? Click here to reset