Principal Differences Analysis: Interpretable Characterization of Differences between Distributions

10/30/2015
by   Jonas Mueller, et al.
0

We introduce principal differences analysis (PDA) for analyzing differences between high-dimensional distributions. The method operates by finding the projection that maximizes the Wasserstein divergence between the resulting univariate populations. Relying on the Cramer-Wold device, it requires no assumptions about the form of the underlying distributions, nor the nature of their inter-class differences. A sparse variant of the method is introduced to identify features responsible for the differences. We provide algorithms for both the original minimax formulation as well as its semidefinite relaxation. In addition to deriving some convergence results, we illustrate how the approach may be applied to identify differences between cell populations in the somatosensory cortex and hippocampus as manifested by single cell RNA-seq. Our broader framework extends beyond the specific choice of Wasserstein divergence.

READ FULL TEXT
research
02/05/2021

Learning High Dimensional Wasserstein Geodesics

We propose a new formulation and learning strategy for computing the Was...
research
01/29/2021

On f-divergences between Cauchy distributions

We prove that the f-divergences between univariate Cauchy distributions ...
research
02/14/2017

Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data

Flow cytometry is a high-throughput technology used to quantify multiple...
research
09/17/2019

Minimax Confidence Intervals for the Sliced Wasserstein Distance

The Wasserstein distance has risen in popularity in the statistics and m...
research
11/02/2022

Geodesic Sinkhorn: optimal transport for high-dimensional datasets

Understanding the dynamics and reactions of cells from population snapsh...
research
02/28/2022

KL Divergence Estimation with Multi-group Attribution

Estimating the Kullback-Leibler (KL) divergence between two distribution...
research
06/18/2019

Multiple Testing Embedded in an Aggregation Tree to Identify where Two Distributions Differ

A key goal of flow cytometry data analysis is to identify the subpopulat...

Please sign up or login with your details

Forgot password? Click here to reset