Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

08/31/2019
by   M. Tarik Altuncu, et al.
14

The large volume of text in electronic healthcare records often remains underused due to a lack of methodologies to extract interpretable content. Here we present an unsupervised framework for the analysis of free text that combines text-embedding with paragraph vectors and graph-theoretical multiscale community detection. We analyse text from a corpus of patient incident reports from the National Health Service in England to find content-based clusters of reports in an unsupervised manner and at different levels of resolution. Our unsupervised method extracts groups with high intrinsic textual consistency and compares well against categories hand-coded by healthcare personnel. We also show how to use our content-driven clusters to improve the supervised prediction of the degree of harm of the incident based on the text of the report. Finally, we discuss future directions to monitor reports over time, and to detect emerging trends outside pre-existing categories.

READ FULL TEXT

page 15

page 16

page 17

page 19

page 20

page 21

research
11/14/2018

From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach

Electronic Healthcare records contain large volumes of unstructured data...
research
07/07/2018

From Text to Topics in Healthcare Records: An Unsupervised Graph Partitioning Methodology

Electronic Healthcare Records contain large volumes of unstructured data...
research
08/03/2018

Content-driven, unsupervised clustering of news articles through multiscale graph partitioning

The explosion in the amount of news and journalistic content being gener...
research
10/28/2020

Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles

Production of news content is growing at an astonishing rate. To help ma...
research
08/29/2017

Clustering Patients with Tensor Decomposition

In this paper we present a method for the unsupervised clustering of hig...
research
01/21/2023

The Impact of Opioid Prescribing Limits on Drug Usage in South Carolina: A Novel Geospatial and Time Series Data Analysis

Background: To curb the opioid epidemic, legislation regulating the amou...
research
02/04/2020

Plague Dot Text: Text mining and annotation of outbreak reports of the Third Plague Pandemic (1894-1952)

The design of models that govern diseases in population is commonly buil...

Please sign up or login with your details

Forgot password? Click here to reset