Multiple Testing Embedded in an Aggregation Tree to Identify where Two Distributions Differ

by   John Pura, et al.

A key goal of flow cytometry data analysis is to identify the subpopulation of cells whose attributes are responsive to the treatment. These cells are supposed to be sparse among the entire cell population. To identify them, we propose a novel multiple TEsting on the Aggregation tree Method (TEAM) to locate where the treated and the control distributions differ. TEAM has a bottom-up hierarchical structure. On the bottom layer, we search for the short-range spiky distributional differences; while on the higher layers, we search for the long-range weak distributional differences. Starting from layer two, on each layer nested hypotheses are formed based on the testing results from the previous layers, and the rejection rule will also depend on the previous layer. Under the mild conditions, we proved that TEAM will yield consistent layer-specific and overall false discovery proportion (FDP). We also showed that when there are sufficient long-range weak distributions differences, TEAM will yield better power compared with the signal-layer multiple testing methods. The simulations under different settings verified our theoretical results. As an illustration, we applied TEAM to a flow cytometry study where we successfully identified the cell subpopulation that is responsive to the cytomegalovirus antigen.


Distance Assisted Recursive Testing

In many applications, a large number of features are collected with the ...

A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model

The Benjamini-Hochberg (BH) procedure remains widely popular despite hav...

PIC: Permutation Invariant Convolution for Recognizing Long-range Activities

Neural operations as convolutions, self-attention, and vector aggregatio...

Causal Transformers Perform Below Chance on Recursive Nested Constructions, Unlike Humans

Recursive processing is considered a hallmark of human linguistic abilit...

Seed-Point Detection of Clumped Convex Objects by Short-Range Attractive Long-Range Repulsive Particle Clustering

Locating the center of convex objects is important in both image process...

Principal Differences Analysis: Interpretable Characterization of Differences between Distributions

We introduce principal differences analysis (PDA) for analyzing differen...

Nonparametric Adaptive CUSUM Chart for Detecting Arbitrary Distributional Changes

Nonparametric control charts that can detect arbitrary distributional ch...

Please sign up or login with your details

Forgot password? Click here to reset