DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of Heterogeneous Data

08/31/2021
by   Nazanin Alipourfard, et al.
0

Quantitative analysis of large-scale data is often complicated by the presence of diverse subgroups, which reduce the accuracy of inferences they make on held-out data. To address the challenge of heterogeneous data analysis, we introduce DoGR, a method that discovers latent confounders by simultaneously partitioning the data into overlapping clusters (disaggregation) and modeling the behavior within them (regression). When applied to real-world data, our method discovers meaningful clusters and their characteristic behaviors, thus giving insight into group differences and their impact on the outcome of interest. By accounting for latent confounders, our framework facilitates exploratory analysis of noisy, heterogeneous data and can be used to learn predictive models that better generalize to new data. We provide the code to enable others to use DoGR within their data analytic workflows.

READ FULL TEXT
research
06/12/2017

General Latent Feature Models for Heterogeneous Datasets

Latent feature modeling allows capturing the latent structure responsibl...
research
03/27/2019

The Landscape of R Packages for Automated Exploratory Data Analysis

The increasing availability of large but noisy data sets with a large nu...
research
01/26/2022

Multi-objective Semi-supervised Clustering for Finding Predictive Clusters

This study concentrates on clustering problems and aims to find compact ...
research
11/19/2020

Categorical exploratory data analysis on goodness-of-fit issues

If the aphorism "All models are wrong"- George Box, continues to be true...
research
06/07/2023

Tree models for assessing covariate-dependent method agreement

Method comparison studies explore the agreement of measurements made by ...
research
09/15/2023

Modeling Data Analytic Iteration With Probabilistic Outcome Sets

In 1977 John Tukey described how in exploratory data analysis, data anal...
research
08/18/2020

Glucodensities: a new representation of glucose profiles using distributional data analysis

Biosensor data has the potential ability to improve disease control and ...

Please sign up or login with your details

Forgot password? Click here to reset