Wasserstein Regression with Empirical Measures and Density Estimation for Sparse Data

08/24/2023
by   Yidong Zhou, et al.
0

The problem of modeling the relationship between univariate distributions and one or more explanatory variables has found increasing interest. Traditional functional data methods cannot be applied directly to distributional data because of their inherent constraints. Modeling distributions as elements of the Wasserstein space, a geodesic metric space equipped with the Wasserstein metric that is related to optimal transport, is attractive for statistical applications. Existing approaches proceed by substituting proxy estimated distributions for the typically unknown response distributions. These estimates are obtained from available data but are problematic when for some of the distributions only few data are available. Such situations are common in practice and cannot be addressed with available approaches, especially when one aims at density estimates. We show how this and other problems associated with density estimation such as tuning parameter selection and bias issues can be side-stepped when covariates are available. We also introduce a novel version of distribution-response regression that is based on empirical measures. By avoiding the preprocessing step of recovering complete individual response distributions, the proposed approach is applicable when the sample size available for some of the distributions is small. In this case, one can still obtain consistent distribution estimates even for distributions with only few data by gaining strength across the entire sample of distributions, while traditional approaches where distributions or densities are estimated individually fail, since sparsely sampled densities cannot be consistently estimated. The proposed model is demonstrated to outperform existing approaches through simulations. Its efficacy is corroborated in two case studies on Environmental Influences on Child Health Outcomes (ECHO) data and eBay auction data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2018

Wasserstein Covariance for Multiple Random Densities

A common feature of methods for analyzing samples of probability density...
research
10/29/2019

Wasserstein F-tests and Confidence Bands for the Frèchet Regression of Density Response Curves

Data consisting of samples of probability density functions are increasi...
research
09/12/2022

Wasserstein Distributional Learning

Learning conditional densities and identifying factors that influence th...
research
07/20/2021

Conditional Wasserstein Barycenters and Interpolation/Extrapolation of Distributions

Increasingly complex data analysis tasks motivate the study of the depen...
research
12/18/2020

On the density estimation problem for uncertainty propagation with unknown input distributions

In this article we study the problem of quantifying the uncertainty in a...
research
07/12/2023

Distribution-on-Distribution Regression with Wasserstein Metric: Multivariate Gaussian Case

Distribution data refers to a data set where each sample is represented ...
research
04/13/2023

A Natural Copula

Copulas are widely used in financial economics as well as in other areas...

Please sign up or login with your details

Forgot password? Click here to reset