A Nearest-Neighbor Based Nonparametric Test for Viral Remodeling in Heterogeneous Single-Cell Proteomic Data

by   Trambak Banerjee, et al.

An important problem in contemporary immunology studies based on single-cell protein expression data is to determine whether cellular expressions are remodeled post infection by a pathogen. One natural approach for detecting such changes is to use non-parametric two-sample statistical tests. However, in single-cell studies, direct application of these tests is often inadequate because single-cell level expression data from uninfected population often contains attributes of several latent sub-populations with highly heterogeneous characteristics. As a result, viruses often infect these different sub-populations at different rates in which case the traditional nonparametric two-sample tests for checking similarity in distributions are no longer conservative. We propose a new nonparametric method for Testing Remodeling Under Heterogeneity (TRUH) that can accurately detect changes in the infected samples compared to possibly heterogeneous uninfected samples. Our testing framework is based on composite nulls and is designed to allow the null model to encompass the possibility that the infected samples, though unaltered by the virus, might be dominantly arising from under-represented sub-populations in the baseline data. The TRUH statistic, which uses nearest neighbor projections of the infected samples into the baseline uninfected population, is calibrated using a novel bootstrap algorithm. We demonstrate the non-asymptotic performance of the test via simulation experiments and derive the large sample limit of the test statistic, which provides theoretical support towards consistent asymptotic calibration of the test. We use the TRUH statistic for studying remodeling in tonsillar T cells under different types of HIV infection and find that unlike traditional tests, TRUH based statistical inference conforms to the biologically validated immunological theories on HIV infection.


Bootstrapped Edge Count Tests for Nonparametric Two-Sample Inference Under Heterogeneity

Nonparametric two-sample testing is a classical problem in inferential s...

Empirical Likelihood Ratio Test on quantiles under a Density Ratio Model

Population quantiles are important parameters in many applications. Enth...

RISE: Rank in Similarity Graph Edge-Count Two-Sample Test

Two-sample hypothesis testing for high-dimensional data is ubiquitous no...

A novel statistical approach for two-sample testing based on the overlap coefficient

Here we propose a new nonparametric framework for two-sample testing, na...

ERStruct: An Eigenvalue Ratio Approach to Inferring Population Structure from Sequencing Data

Inference of population structure from genetic data plays an important r...

Reproducible Bootstrap Aggregating

Heterogeneity between training and testing data degrades reproducibility...

Optimal Nonparametric Inference under Quantization

Statistical inference based on lossy or incomplete samples is of fundame...

Please sign up or login with your details

Forgot password? Click here to reset