Challenges of Big Data Analysis

08/07/2013
by   Jianqing Fan, et al.
0

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

READ FULL TEXT

page 7

page 8

page 10

page 14

page 15

page 16

page 17

page 19

research
08/02/2013

United Statistical Algorithm, Small and Big Data: Future OF Statistician

This article provides the role of big idea statisticians in future of Bi...
research
10/23/2018

Goodness-of-Fit Tests for Large Datasets

Nowadays, data analysis in the world of Big Data is connected typically ...
research
12/28/2017

Field Studies with Multimedia Big Data: Opportunities and Challenges (Extended Version)

Social multimedia users are increasingly sharing all kinds of data about...
research
03/08/2019

Do we still need fuzzy classifiers for Small Data in the Era of Big Data?

The Era of Big Data has forced researchers to explore new distributed so...
research
12/21/2018

The future of statistical disclosure control

Statistical disclosure control (SDC) was not created in a single seminal...
research
02/12/2019

High dimensionality: The latest challenge to data analysis

The advent of modern technology, permitting the measurement of thousands...
research
07/25/2018

Big Data: the End of the Scientific Method?

We argue that the boldest claims of Big Data are in need of revision and...

Please sign up or login with your details

Forgot password? Click here to reset