Decentralized Nonparametric Multiple Testing

05/05/2018
by   Subhadeep Mukhopadhyay, et al.
0

Consider a big data multiple testing task, where, due to storage and computational bottlenecks, one is given a very large collection of p-values by splitting into manageable chunks and distributing over thousands of computer nodes. This paper is concerned with the following question: How can we find the full data multiple testing solution by operating completely independently on individual machines in parallel, without any data exchange between nodes? This version of the problem tends naturally to arise in a wide range of data-intensive science and industry applications whose methodological solution has not appeared in the literature to date; therefore, we feel it is necessary to undertake such analysis. Based on the nonparametric functional statistical viewpoint of large-scale inference, started in Mukhopadhyay (2016), this paper furnishes a new computing model that brings unexpected simplicity to the design of the algorithm which might otherwise seem daunting using classical approach and notations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Big data systems development is full of challenges in view of the variet...
research
06/02/2020

Improved q-values for discrete uniform and homogeneous tests: a comparative study

Large scale discrete uniform and homogeneous P-values often arise in app...
research
02/22/2021

Divide-and-conquer methods for big data analysis

In the context of big data analysis, the divide-and-conquer methodology ...
research
05/25/2018

How Many Machines Can We Use in Parallel Computing for Kernel Ridge Regression?

This paper attempts to solve a basic problem in distributed statistical ...
research
03/18/2015

Efficient Machine Learning for Big Data: A Review

With the emerging technologies and all associated devices, it is predict...
research
03/07/2018

Big data analytics: The stakes for students, scientists & managers - a management perspective

For a developing nation, deploying big data (BD) technology and introduc...

Please sign up or login with your details

Forgot password? Click here to reset