Where to find needles in a haystack?

10/07/2019
by   Zhigen Zhao, et al.
0

In many existing methods in multiple comparison, one starts with either Fisher's p-values or the local fdr scores. The former one, with a usual definition as the tail probability exceeding the observed test statistic under the null distribution, fails to use the information from the alternative hypothesis and the targeted region of signals could be completely wrong especially when the likelihood ratio function is not monotone. The local fdr based approaches, usually relying on the density functions, are optimal oracally. However, the targeted region of the signals of the data-driven version is problematic because of the slow convergence of the non-parametric density estimation especially on the boundaries. In this paper, we propose a new method: Cdf and Local fdr Assisted multiple Testing method (CLAT), which is optimal for cases when the p-values based method are not. Additionally, the data-driven version only relies on the estimation of the cumulative distribution function and converges to the oracle version quickly. Both simulations and real data analysis demonstrate the superior performance of the proposed method than the existing ones. Furthermore, the computation is instantaneous based on a novel algorithm and is scalable to the large data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2020

Data-driven aggregation in non-parametric density estimation on the real line

We study non-parametric estimation of an unknown density with support in...
research
02/21/2017

Direct estimation of density functionals using a polynomial basis

A number of fundamental quantities in statistical signal processing and ...
research
11/01/2019

Exact model comparisons in the plausibility framework

Plausibility is a formalization of exact tests for parametric models and...
research
06/29/2022

LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood

Most of the existing methods for estimating the local intrinsic dimensio...
research
10/17/2019

Obfuscation via Information Density Estimation

Identifying features that leak information about sensitive attributes is...
research
01/04/2019

Fast Multi-Class Probabilistic Classifier by Sparse Non-parametric Density Estimation

The model interpretation is essential in many application scenarios and ...
research
01/18/2008

P-values for classification

Let (X,Y) be a random variable consisting of an observed feature vector ...

Please sign up or login with your details

Forgot password? Click here to reset