Score matching for compositional distributions

12/23/2020
by   Janice L. Scealy, et al.
0

Compositional data and multivariate count data with known totals are challenging to analyse due to the non-negativity and sum-to-one constraints on the sample space. It is often the case that many of the compositional components are highly right-skewed, with large numbers of zeros. A major limitation of currently available estimators for compositional models is that they either cannot handle many zeros in the data or are not computationally feasible in moderate to high dimensions. We derive a new set of novel score matching estimators applicable to distributions on a Riemannian manifold with boundary, of which the standard simplex is a special case. The score matching method is applied to estimate the parameters in a new flexible truncation model for compositional data and we show that the estimators are scalable and available in closed form. Through extensive simulation studies, the scoring methodology is demonstrated to work well for estimating the parameters in the new truncation model and also for the Dirichlet distribution. We apply the new model and estimators to real microbiome compositional data and show that the model provides a good fit to the data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2023

Robust score matching for compositional data

The restricted polynomially-tilted pairwise interaction (RPPI) distribut...
research
02/20/2018

A folded model for compositional data analysis

A folded type model is developed for analyzing compositional data. The p...
research
03/07/2023

Extremes in High Dimensions: Methods and Scalable Algorithms

Extreme-value theory has been explored in considerable detail for univar...
research
01/09/2018

Bayesian Fitting of Dirichlet Type I and II Distributions

In his 1986 book, Aitchison explains that compositional data is regularl...
research
07/14/2020

A novel dowscaling procedure for compositional data in the Aitchison geometry with application to soil texture data

In this work, we present a novel downscaling procedure for compositional...
research
07/01/2021

Dealing with overdispersion in multivariate count data

The problem of overdispersion in multivariate count data is a challengin...
research
09/10/2021

Interaction Models and Generalized Score Matching for Compositional Data

Applications such as the analysis of microbiome data have led to renewed...

Please sign up or login with your details

Forgot password? Click here to reset