Interaction Models and Generalized Score Matching for Compositional Data

by   Shiqing Yu, et al.

Applications such as the analysis of microbiome data have led to renewed interest in statistical methods for compositional data, i.e., multivariate data in the form of probability vectors that contain relative proportions. In particular, there is considerable interest in modeling interactions among such relative proportions. To this end we propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex. Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions. Generally, the distributions we consider have a density that features a difficult to compute normalizing constant. To circumvent this issue, we design effective estimation methods based on generalized versions of score matching. A high-dimensional analysis of our estimation methods shows that the simplex domain is handled as efficiently as previously studied full-dimensional domains.


page 1

page 2

page 3

page 4


Generalized Score Matching for Non-Negative Data

A common challenge in estimating parameters of probability density funct...

Multivariate tail covariance for generalized skew-elliptical distributions

In this paper, the multivariate tail covariance (MTCov) for generalized ...

Graphical Models for Non-Negative Data Using Generalized Score Matching

A common challenge in estimating parameters of probability density funct...

Score matching for compositional distributions

Compositional data and multivariate count data with known totals are cha...

Generalized Score Matching for General Domains

Estimation of density functions supported on general domains arises when...

Compositional Data Regression in Insurance with Exponential Family PCA

Compositional data are multivariate observations that carry only relativ...

Gaussian asymptotic limits for the α-transformation in the analysis of compositional data

Compositional data consists of vectors of proportions whose components s...