Constrained tensor factorization for computational phenotyping and mortality prediction in patients with cancer

12/24/2021
by   Francisco Y Cai, et al.
0

Background: The increasing adoption of electronic health records (EHR) across the US has created troves of computable data, to which machine learning methods have been applied to extract useful insights. EHR data, represented as a three-dimensional analogue of a matrix (tensor), is decomposed into two-dimensional factors that can be interpreted as computational phenotypes. Methods: We apply constrained tensor factorization to derive computational phenotypes and predict mortality in cohorts of patients with breast, prostate, colorectal, or lung cancer in the Northwestern Medicine Enterprise Data Warehouse from 2000 to 2015. In our experiments, we examined using a supervised term in the factorization algorithm, filtering tensor co-occurrences by medical indication, and incorporating additional social determinants of health (SDOH) covariates in the factorization process. We evaluated the resulting computational phenotypes qualitatively and by assessing their ability to predict five-year mortality using the area under the curve (AUC) statistic. Results: Filtering by medical indication led to more concise and interpretable phenotypes. Mortality prediction performance (AUC) varied under the different experimental conditions and by cancer type (breast: 0.623 - 0.694, prostate: 0.603 - 0.750, colorectal: 0.523 - 0.641, and lung: 0.517 - 0.623). Generally, prediction performance improved with the use of a supervised term and the incorporation of SDOH covariates. Conclusion: Constrained tensor factorization, applied to sparse EHR data of patients with cancer, can discover computational phenotypes predictive of five-year mortality. The incorporation of SDOH variables into the factorization algorithm is an easy-to-implement and effective way to improve prediction performance.

READ FULL TEXT
research
08/06/2018

Improved survival of cancer patients admitted to the ICU between 2002 and 2011 at a U.S. teaching hospital

Over the past decades, both critical care and cancer care have improved ...
research
08/01/2022

MULTIPAR: Supervised Irregular Tensor Factorization with Multi-task Learning

Tensor factorization has received increasing interest due to its intrins...
research
08/08/2018

PIVETed-Granite: Computational Phenotypes through Constrained Tensor Factorization

It has been recently shown that sparse, nonnegative tensor factorization...
research
01/12/2019

Personalized Colorectal Cancer Survivability Prediction with Machine Learning Methods

In this work, we investigate the importance of ethnicity in colorectal c...
research
01/13/2020

Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research

Objective Electronic health records (EHRs) are a promising source of dat...
research
07/22/2019

Evaluation of Embeddings of Laboratory Test Codes for Patients at a Cancer Center

Laboratory test results are an important and generally highly dimensiona...

Please sign up or login with your details

Forgot password? Click here to reset