Embedding Complexity In the Data Representation Instead of In the Model: A Case Study Using Heterogeneous Medical Data

02/12/2018
by   Jacek M. Bajor, et al.
0

Electronic Health Records have become popular sources of data for secondary research, but their use is hampered by the amount of effort it takes to overcome the sparsity, irregularity, and noise that they contain. Modern learning architectures can remove the need for expert-driven feature engineering, but not the need for expert-driven preprocessing to abstract away the inherent messiness of clinical data. This preprocessing effort is often the dominant component of a typical clinical prediction project. In this work we propose using semantic embedding methods to directly couple the raw, messy clinical data to downstream learning architectures with truly minimal preprocessing. We examine this step from the perspective of capturing and encoding complex data dependencies in the data representation instead of in the model, which has the nice benefit of allowing downstream processing to be done with fast, lightweight, and simple models accessible to researchers without machine learning expertise. We demonstrate with three typical clinical prediction tasks that the highly compressed, embedded data representations capture a large amount of useful complexity, although in some cases the compression is not completely lossless.

READ FULL TEXT

page 5

page 8

research
03/15/2023

Rediscovery of CNN's Versatility for Text-based Encoding of Raw Electronic Health Records

Making the most use of abundant information in electronic health records...
research
09/19/2019

Representation Learning for Electronic Health Records

Information in electronic health records (EHR), such as clinical narrati...
research
10/30/2019

Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models

Clinical notes contain an extensive record of a patient's health status,...
research
12/06/2019

Med2Meta: Learning Representations of Medical Concepts with Meta-Embeddings

Distributed representations of medical concepts have been used to suppor...
research
07/27/2020

EffiCare: Better Prognostic Models via Resource-Efficient Health Embeddings

Recent medical prognostic models adapted from high data-resource fields ...
research
11/04/2019

A Spark ML driven preprocessing approach for deep learning based scholarly data applications

Big data has found applications in multiple domains. One of the largest ...
research
11/14/2022

Learning predictive checklists from continuous medical data

Checklists, while being only recently introduced in the medical domain, ...

Please sign up or login with your details

Forgot password? Click here to reset