Predictive Heterogeneity: Measures and Applications

04/01/2023
by   Jiashuo Liu, et al.
0

As an intrinsic and fundamental property of big data, data heterogeneity exists in a variety of real-world applications, such as precision medicine, autonomous driving, financial applications, etc. For machine learning algorithms, the ignorance of data heterogeneity will greatly hurt the generalization performance and the algorithmic fairness, since the prediction mechanisms among different sub-populations are likely to differ from each other. In this work, we focus on the data heterogeneity that affects the prediction of machine learning models, and firstly propose the usable predictive heterogeneity, which takes into account the model capacity and computational constraints. We prove that it can be reliably estimated from finite data with probably approximately correct (PAC) bounds. Additionally, we design a bi-level optimization algorithm to explore the usable predictive heterogeneity from data. Empirically, the explored heterogeneity provides insights for sub-population divisions in income prediction, crop yield prediction and image classification tasks, and leveraging such heterogeneity benefits the out-of-distribution generalization performance.

READ FULL TEXT

page 13

page 14

page 20

research
05/21/2023

Exploring and Exploiting Data Heterogeneity in Recommendation

Massive amounts of data are the foundation of data-driven recommendation...
research
05/03/2023

Modelling heterogeneity in the classification process in multi-species distribution models can improve predictive performance

1. Species distribution models and maps from large-scale biodiversity da...
research
05/23/2023

Federated Variational Inference: Towards Improved Personalization and Generalization

Conventional federated learning algorithms train a single global model b...
research
07/29/2016

gLOP: the global and Local Penalty for Capturing Predictive Heterogeneity

When faced with a supervised learning problem, we hope to have rich enou...
research
03/28/2019

Using Latent Class Analysis to Identify ARDS Sub-phenotypes for Enhanced Machine Learning Predictive Performance

In this work, we utilize Machine Learning for early recognition of patie...
research
06/01/2023

Improve State-Level Wheat Yield Forecasts in Kazakhstan on GEOGLAM's EO Data by Leveraging A Simple Spatial-Aware Technique

Accurate yield forecasting is essential for making informed policies and...
research
06/07/2023

ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection

As computer vision systems are being increasingly deployed at scale in h...

Please sign up or login with your details

Forgot password? Click here to reset