From Text to Topics in Healthcare Records: An Unsupervised Graph Partitioning Methodology

07/07/2018
by   M. Tarik Altuncu, et al.
0

Electronic Healthcare Records contain large volumes of unstructured data, including extensive free text. Yet this source of detailed information often remains under-used because of a lack of methodologies to extract interpretable content in a timely manner. Here we apply network-theoretical tools to analyse free text in Hospital Patient Incident reports from the National Health Service, to find clusters of documents with similar content in an unsupervised manner at different levels of resolution. We combine deep neural network paragraph vector text-embedding with multiscale Markov Stability community detection applied to a sparsified similarity graph of document vectors, and showcase the approach on incident reports from Imperial College Healthcare NHS Trust, London. The multiscale community structure reveals different levels of meaning in the topics of the dataset, as shown by descriptive terms extracted from the clusters of records. We also compare a posteriori against hand-coded categories assigned by healthcare personnel, and show that our approach outperforms LDA-based models. Our content clusters exhibit good correspondence with two levels of hand-coded categories, yet they also provide further medical detail in certain areas and reveal complementary descriptors of incidents beyond the external classification taxonomy.

READ FULL TEXT
research
11/14/2018

From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach

Electronic Healthcare records contain large volumes of unstructured data...
research
08/31/2019

Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

The large volume of text in electronic healthcare records often remains ...
research
08/03/2018

Content-driven, unsupervised clustering of news articles through multiscale graph partitioning

The explosion in the amount of news and journalistic content being gener...
research
03/08/2023

PyGenStability: Multiscale community detection with generalized Markov Stability

We present PyGenStability, a general-use Python software package that pr...
research
05/04/2015

Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database for Automated Image Interpretation

Despite tremendous progress in computer vision, there has not been an at...
research
06/19/2017

Topic Modeling for Classification of Clinical Reports

Electronic health records (EHRs) contain important clinical information ...

Please sign up or login with your details

Forgot password? Click here to reset