Statistical Validity and Consistency of Big Data Analytics: A General Framework

03/29/2018
by   Bikram Karmakar, et al.
0

Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making inferences from it. Although storage, retrieval and management of Big Data seem possible through efficient algorithm and system development, concern about statistical consistency remains to be addressed in view of its specific characteristics. Since Big Data does not conform to standard analytics, we need proper modification of the existing statistical theory and tools. Here we propose, with illustrations, a general statistical framework and an algorithmic principle for Big Data analytics that ensure statistical accuracy of the conclusions. The proposed framework has the potential to push forward advancement of Big Data analytics in the right direction. The partition-repetition approach proposed here is broad enough to encompass all practical data analytic problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

The Principle of Least Sensing: A Privacy-Friendly Sensing Paradigm for Urban Big Data Analytics

With the worldwide emergence of data protection regulations, how to cond...
research
07/08/2016

Translating Bayesian Networks into Entity Relationship Models, Extended Version

Big data analytics applications drive the convergence of data management...
research
03/01/2015

23-bit Metaknowledge Template Towards Big Data Knowledge Discovery and Management

The global influence of Big Data is not only growing but seemingly endle...
research
05/23/2020

Implementation of Self-Organizing Network (SON) on Cellular Technology base on Big Data Analytic

The development of cellular technology will be directly proportional to ...
research
08/09/2017

Using Deep Neural Networks to Automate Large Scale Statistical Analysis for Big Data Applications

Statistical analysis (SA) is a complex process to deduce population prop...
research
09/14/2020

A Hybrid Framework for Topology Identification of Distribution Grid with Renewables Integration

Topology identification (TI) is a key task for state estimation (SE) in ...
research
02/14/2019

OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

Statistical methods for genomewide association studies (GWAS) continue t...

Please sign up or login with your details

Forgot password? Click here to reset