A Survey of Bayesian Statistical Approaches for Big Data

06/08/2020 ∙ by Farzana Jahan, et al. ∙ qut 0

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Although there are many variations on the definition of Big Data [91, 51, 52, 183], it is clear that it encompasses large and often diverse quantitative data obtained from increasing numerous sources at different individual, spatial and temporal scales, and with different levels of quality. Examples of Big Data include data generated from social media [22]; data collected in biomedical and healthcare informatics research such as DNA sequences and electronic health records [114]; geospatial data generated by remote sensing, laser scanning, mobile mapping, geo-located sensors, geo-tagged web contents, volunteered geographic information (VGI), global navigation satellite system (GNSS) tracking and so on [103]. The volume and complexity of Big Data often exceeds the capability of the standard analytics tools (software, hardware, methods and algorithms) [92, 70]

. The concomitant challenges of managing, modelling, analysing and interpreting these data have motivated a large literature on potential solutions from a range of domains including statistics, machine learning and computer science. This literature can be grouped into four broad categories of articles. The first includes general articles about the concept of Big Data, including the features and challenges, and their application and importance in specific fields. The second includes literature concentrating on infrastructure and management, including parallel computing and specialised software. The third focuses on statistical and machine learning models and algorithms for Big Data. The final category includes articles on the application of these new techniques to complex real-world problems.

In this chapter, we classify the literature published on Big Data into finer classes than the four broad categories mentioned earlier and briefly reviewed the contents covered by those different categories. But the main focus of the chapter is around the third category, in particular on statistical contributions to Big Data. We examine the nature of these innovations and attempt to catalogue them as modelling, algorithmic or other contributions. We then drill further into this set and examine the more specific literature on Bayesian approaches. Although there is an increasing interest in this paradigm from a wide range of perspectives including statistics, machine learning, information science, computer science and the various application areas, to our knowledge there has not yet been a review of Bayesian statistical approaches for Big Data. This is the primary contribution of this chapter.

This chapter provides a review of the published studies that present Bayesian statistical models specifically for Big Data and discusses the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data.

The chapter proceeds as follows. In the next section, literature search and inclusion criteria for this chapter is outlined. A classification of Big Data literature along with brief review of relevant literature in each class is presented in section 3. Section 4 consists of a brief review of articles discussing Big Data problems from statistical perspectives, followed by a review of Bayesian approaches applied to Big Data. The final section includes a discussion of this review with a view to answering the research question posed above.

2 Literature Search and Inclusion Criteria

The literature search for this review paper was undertaken using different methods. The search methods implemented to find the relevant literature and the criteria for the inclusion of the literature in this chapter are briefly discussed in this section.

2.1 Inclusion Criteria

Acknowledging the fact that there has been a wide range of literature on Big Data, the specific focus in this chapter was on recent developments published in the last 5 years, 2013-2019.

For quality assurance reasons, of the literature only peer reviewed published articles, book chapters and conference proceedings were included in the chapter. Some articles were also included from arXiv and pre-print versions for those to be soon published and from well known researchers working in that particular area of interest.

2.2 Search Methods

Database Search: The database “Scopus" was used to initiate the literature search. To identify the availability of literature and broadly learn about the broad areas of concentration, the following keywords were used: Big Data, Big Data Analysis, Big Data Analytics, Statistics and Big Data.

The huge range of literature obtained by this initial search was complemented by a search of “Google Scholar" using more specific key words as follows: Features and Challenges of Big Data, Big Data Infrastructure, Big Data and Machine Learning, Big Data and Cloud Computing, Statistical approaches/methods/models in Big Data, Bayesian Approaches/Methods/Models in Big Data, Big Data analysis using Bayesian Statistics, Bayesian Big Data, Bayesian Statistics and Big Data.

Expert Knowledge: In addition to the literature found by the above Database search, we used expert knowledge and opinions in the field and reviewed the works of well known researchers in the field of Bayesian Statistics for their research works related to Bayesian approaches to Big Data and included the relevant publications for review in this chapter.

Scanning References of selected literature: Further studies and literature were found by searching the references of selected literature.

Searching with specific keywords:

Since the focus of this chapter is to review the Bayesian approaches to Big Data, more literature was sourced by using specific Bayesian methods or approaches found to be applied to Big Data: Approximate Bayesian Computation and Big Data, Bayesian Networks in Big Data, Classification and regression trees/Bayesian Additive regression trees in Big Data, Naive Bayes Classifiers and Big Data, Sequential Monte Carlo and Big Data, Hamiltonian Monte Carlo and Big Data, Variational Bayes and Big Data, Bayesian Empirical Likelihood and Big Data, Bayesian Spatial modelling and Big Data, Non parametric Bayes and Big Data.

This last step was conducted in order to ensure that this chapter covers the important and emerging areas of Bayesian Statistics and their application to Big Data. These searches were conducted in “Google Scholar" and up to 30 pages of results were considered in order to find relevant literature.

3 Classification of Big Data literature

The published articles on Big Data can be divided into finer classes than the four main categories described above. Of course, there are many ways to make these delineations. Table 1 shows one such delineation, with representative references from the last five years of published literature. The aim of this table is to indicate the wide ranging literature on Big Data and provide relevant references in different categories for interested readers.

Figure 1: Classification of Big Data literature
Topic Representative References
Features and Challenges [167, 199, 140, 51, 52, 183, 70, 63, 65, 159].
Infrastructure [96, 207, 117, 164, 182, 10, 206, 131, 138, 142, 192, 50, 109].
Cloud computing [120, 130, 38, 200, 11, 113, 32, 205, 54, 137, 174].
Applications (3 examples) Social science: [5, 39, 22, 121, 163, 37]. Health/medicine/medical science: [118, 46, 8, 16, 158, 9, 21, 28, 34, 181, 19, 152, 157, 201, 82]. Business: [153, 60, 171, 66, 31, 2, 36, 122].
Machine Learning Methods [55, 26, 139, 3, 4, 97, 172, 27, 64, 89].
Statistical Methods [61, 136, 193, 186, 189, 112, 204, 67, 161, 86, 184, 57, 143, 197, 44, 173, 45, 87, 84].
Bayesian Methods [80, 149, 209, 198, 102, 128, 105, 100, 110, 162, 81, 179, 129, 115, 169, 210, 7, 108].
Table 1: Classes of Big Data Literature

The links between these classes of literature can be visualised as in Figure 1 and a brief description of each of the classes and the contents covered by the relevant references listed are provided in Table 2. The brief reviews presented in Table 2 can be helpful for interested readers to develop a broad idea about each of the classes mentioned in Table 1. However, Table 2 does not include brief reviews of the last two classes, namely, Statistical Methods and Bayesian Methods, since these classes are discussed in detail in sections 4 and 5. We would like to acknowledge the fact that Bayesian methods are essentially part of statistical methods, but in this chapter, the distinct classes are made intentionally to be able to identify and discuss the specific developments in Bayesian approaches.

Features and Challenges
  • The general features of Big Data are volume, variety, velocity, veracity, value [159, 52] and some salient features include massive sample sizes and high dimensionality [159].

  • Many challenges of Big Data regarding storage, processing, analysis and privacy are identified in the literature [52, 140, 159, 65, 63].

Infrastructure
  • To manage and analyse Big Data, infrastructural support is needed such as sufficient storage technologies and data management systems. These are being continuously developed and improved. MangoDB, Terrastore and RethhinkDb are some examples of the storage technologies; more on evolution technologies with their strengths, weaknesses, opportunities and threats are available in [164].

  • To analyse Big Data, parallel processing systems and scalable algorithms are needed. MapReduce is one of the pioneering data processing systems [206]. Some other useful and popular tools to handle Big Data are Apache, Hadoop, Spark [10].

Cloud computing
  • Cloud computing, the practice of using a network of remote servers hosted on the Internet rather than a local server or a personal computer, plays a key role in Big Data analysis by providing required infrastructure needed to store, analyse, visualise and model Big Data using scalable and adaptive systems [11].

  • Opportunities and challenges of cloud computing technologies, future trends and application areas are widely discussed in the literature [200, 32, 174]

    and new developments on cloud computing are proposed to overcome known challenges, such as collaborative anomaly detection

    [130], hybrid approach for scalable sub-tree anonymisation using MapReduce on cloud [205] etc.

Applications (3 examples)
  • Big Data has made it possible to analyse social behaviour and an individual’s interactions with social systems based on social media usage [5, 163, 37]. Discussions on challenges and future of social science research using Big Data have been made in the literature [163, 39].

  • Research involving Big Data in medicine, public health, biomedical and health informatics has increased exponentially over the last decade [46, 152, 157, 28, 114, 19]. Some examples include infectious disease research [16, 82], developing personalised medicine and health care [181, 9] and improving cardiovascular care [158].

  • Analysis of Big Data is used to solve many real world problems in business, in particular, using Big Data analytics for innovations in leading organisations [122], predictive analytics in retail [31], analysis of business risks and benefits [153], development of market strategies [60] and so on. The opportunities and challenges of Big Data in e-commerce and Big Data integration in business processes can be found in the review articles by [2] and [183].

Machine Learning Methods
  • Machine learning is an interdisciplinary field of research primarily focusing on theory, performance, properties of learning systems and algorithms [148]. Traditional machine learning is evolving to tackle the additional challenges of Big Data [148, 4].

  • Some examples of developments in machine learning theories and algorithms for Big Data include high performance machine learning toolbox [3], scalable machine learning online services for Big Data real time analysis [14].

  • There is a large and increasing research on specific applications of machine learning tools for Big Data in different disciplines. For example, [139] discussed the future of Big Data and machine learning in clinical medicine; [13] discussed a classifier specifically for medical Big Data and [26] reviewed the state of art and future prospects of machine learning and Big Data in radiation oncology.

Table 2: Brief Review of relevant literature under identified classes

4 Statistical Approaches to Big Data

The importance of modelling and theoretical considerations for analysing Big Data are well stated in the literature [86, 197]. These authors pointed out that blind trust in algorithms without proper theoretical considerations will not result in valid outputs. The emerging challenges of Big Data are beyond the issues of processing, storing and management. The choice of suitable statistical methods is crucial in order to make the most of the Big Data [67, 87]. [61] highlighted the role of statistical methods for interpretability, uncertainty quantification, reducing selection bias in analysing Big Data.

In this section we present a brief review of some of the published research on statistical perspectives, methods, models and algorithms that are targeted to Big Data. As above, the review is confined to the last five years, commencing with the most recent contributions. Bayesian approaches are reserved for the next section.

Topic: Discussion Article
Author: Dunson (2018) [61]
  • Discussed the background of big data from the perspectives of the machine learning and statistics communities.

  • Listed the differences in the methods and inferences as replicability, uncertainty quantification, sampling, selection bias and measurement error drawn from statistical perspectives to those of machine learning.

  • Identified the statistical challenges for high dimensional complex data (big data) in quantifying uncertainty, scaling up sampling methods and selection of priors in Bayesian methods.

Topic: Review
Author: Nongxa (2017) [136]
  • Identified challenges of big data as: high dimensionality, heterogeneity and incompleteness, scale, timeliness, security and privacy.

  • Pointed out that mathematical and statistical challenges of big data require updating the core knowledge areas (i.e., linear algebra, multivariable calculus, elementary probability and statistics, coding or programming) to more advanced topics (i.e., randomised numerical linear algebra, topological data analysis, matrix and tensor decompositions, random graphs; random matrices and complex networks ) in mathematical and statistical education.

Author: Franke et al.(2016) [67]
  • Reviewed different strategies of analysis as: data wrangling, visualisation, dimension reduction, sparsity regularisation, optimisation, measuring distance, representation learning, sequential learning and provided detailed examples of applications.

Author: Chen et al. (2015) [44]
  • Emphasised the importance of statistical knowledge and skills in Big Data Analytics using several examples.

  • Discussed some statistical methods that are useful in the context of big data as: confirmatory and exploratory data analysis tools, data mining methods including supervised learning (classification, regression/prediction) and unsupervised learning (cluster analysis, anomaly detection, association rule learning), visualisation techniques etc.

  • Elaborated on the computational skills needed for statisticians in data acquisition, data processing, data management and data analysis.

Author: Hoerl et al. (2014) [87]
  • Provided a background of big data reviewing relevant articles.

  • Discussed the importance of statistical thinking in big data problems reviewing some misleading results produced by sophisticated analysis of big data without involving statistical principles.

  • Elaborated on the roles of statistical thinking for data quality, domain knowledge, analysis strategies in order to solve complex unstructured problems involving big data.

Topic: Review of methods & extension
Author: Wang et al. (2016) [184]
  • Reviewed statistical methods and software packages in R and recently developed tools to handle Big Data, focusing on three groups: sub-sampling, divide and conquer and online processing.

  • Extended the online updating approach by employing variable selection criteria.

Topic: Methods review, new methods
Author: Genuer et al. (2017) [72]
  • Reviewed proposals dealing with scaling random forests to big data problems.

  • Discussed subsampling, parallel implementations, online processing of random forests in detail.

  • Proposed five variants of Random Forests for big data.

Author: Wang and Xu (2015) [191]
  • Reviewed different clustering methods applicable to big data situations.

  • Proposed a clustering procedure with adaptive density peak detection applying multivariate kernel estimation and demonstrated the performance through simulation studies and analysis of a few benchmark gene expression data sets.

  • Developed a R-package “ADPclust” to implement the proposed methods.

Author: Wang et al. (2017) [186]
  • Proposed a method and algorithm for online updating implementing bias corrections with extensions for application in a generalised linear model (GLM) setting.

  • Evaluated the proposed strategies in comparison with previous algorithms [161].

Topic: New methods and algorithms
Author: Liu et al. (2017) [112]
  • Proposed a novel sparse GLM with L0 approximation for feature selection and prediction in big omics data scenarios.

  • Provided novel algorithm and software in MATLAB (L0ADRIDGE) for performing L0 penalised GLM in ultra high dimensional big data.

  • Comparison of performance with other methods (SCAD, MC+) using simulation and real data analysis (mRNA,microRNA, methylation data from TGCA ovarian cancer).

Author: Schifano et al. (2016) [161]
  • Developed new statistical methods and iterative algorithms for analysing streaming data.

  • Proposed methods to enable update of the estimations and models with the arrival of new data.

Author: Allen et al. (2014) [6]
  • Proposed generalisations to Principal Components Analysis (PCA) to take into account structural relationships in big data settings.

  • Developed fast computational algorithms using the proposed methods (GPCA, sparse GPCA and functional GPCA) for massive data sets.

Topic: New algorithms
Author: Wang and Samworth (2017) [189]
  • Proposed a new algorithm "inspect" (informative sparse projection for estimation of change points) to estimate the number and location of change points in high dimensional time series.

  • The algorithm, starting from a simple time series model, was extended to detect multiple change points and was also extended to have spatial or temporal dependence, assessed using simulation studies and real data application.

Author: Yu and Lee (2017) [202]
  • Extended the alternating direction method of multipliers (ADMM) to solve penalised quantile regression problems involving massive data sets having faster computation and no loss of estimation accuracy.

Author: Zhang and Yang (2017) [204]
  • Proposed new algorithms using ridge regression to make it efficient for handling big data.

Author: Doonrik and Hendry (2015) [57]
  • Discussed the statistical model selection algorithm "autometrics" for econometric data [58] with its application to fat big data (having larger number of variables than the number of observations) .

  • Extended algorithms for tackling computational issues of fat big data applying block searches and re-selection by lasso for correlated regressors.

Author: Sysoev et al. (2014) [173]
  • Presented efficient algorithms to estimate bootstrap or jackknife type confidence intervals for fitted big data sets by Multivariate Monotonic Regression.

  • Evaluated the performance of the proposed algorithms using a case study on death in coronary heart disease for a large population.

Author: Pehlivanl(2015) [143]
  • Proposed a novel approach for feature selection from high dimensional data.

  • Tested the efficiency of the proposed method using sensitivity, specificity, accuracy and ROC curve.

  • Demonstrated the approach on micro-array data.

Table 3: Classification of statistical literature to Big Data

Among the brief reviews of the relevant literature in Table 3, we include detailed reviews of three papers which are more generic in explaining the role of statistics and statistical methods in Big Data along with recent developments in this area.

[184] summarised the published literature on recent methodological developments for Big Data in three broad groups: subsampling, which calculates a statistic in many subsamples taken from the data and then combining the results [144]; divide and conquer, the principle of which is to break a dataset into smaller subsets to analyse these in parallel and combine the results at the end [168] ; and online updating of streaming data [161]

, based on online recursive analytical processing. He summarised the following methods in the first two groups: subsampling based methods (bag of little bootstraps, leveraging, mean log likelihood, subsample based MCMC), divide and conquer (aggregated estimating equations, majority voting, screening with ultra high dimension, parallel MCMC). The authors, after reviewing existing online updating methods and algorithms, extended the online updating of stream data method by including criterion based variable selection with online updating. The authors also discussed the available software packages (open source R as well as commercial software) developed to handle computational complexity involving Big Data. For breaking the memory barrier using R, the authors cited and discussed several data management packages (sqldf, DBI, RSQLite, filehash, bigmemory, ff) and packages for numerical calculation (speedglm, biglm, biganalytics, ffbase, bigtabulate, bigalgebra, bigpca, bigrf, biglars, PopGenome). The R packages for breaking computing power were cited and discussed in two groups: packages for speeding up (compiler, inline, Rcpp, RcpEigen, RcppArmadilo, RInside, microbenchmark, proftools, aprof, lineprof, GUIprofiler) and packages for scaling up (Rmpi, snow, snowFT, snowfall, multicore, parallel, foreach, Rdsm, bigmemory, pdpMPI, pbdSLAP, pbdBASE, pbdMAT, pbdDEMO, Rhipe, segue, rhbase, rhdfs, rmr, plymr, ravroSparkR, pnmath, pnmath0, rsprng, rlecuyer, doRNG, gputools, bigvis). The authors also discussed the developments in Hadoop, Spark, OpenMP, API and using FORTRAN and C++ from R in order to create flexible programs for handling Big Data. The article also presented a brief summary about the commercial statistical software, e.g., SAS, SPSS, MATLAB. The study included a case study of fitting a logistic model to a massive data set on airline on-time performance data from the 2009 ASA Data Expo mentioning the use of some R packages discussed earlier to handle the problem with memory and computational capacity. Overall, this study provided a comprehensive review and discussion of state-of-the-art statistical methodologies and software development for handling Big Data.

[44] presented their views on the challenges and importance of Big Data and explained the role of statistics in Big Data Analytics based on a review of relevant literature. This study emphasised the importance of statistical knowledge and skills in Big Data Analytics using several examples. As detailed in Table 3, the authors broadly discussed a range of statistical methods which can be really helpful in better analysis of Big Data, such as, the use of exploratory data analysis principle in Statistics to investigate correlations among the variables in the data or establish causal relationships between response and explanatory variables in the Big Data. The authors specifically mentioned hypothesis testing, predictive analysis using statistical models, statistical inference using uncertainty estimation to be some key tools to use in Big Data analysis. The authors also explained that the combination of statistical knowledge can be combined with the Data mining methods such as unsupervised learning (cluster analysis, Association rule learning, anomaly detection) and supervised learning (regression and classification) can be beneficial for Big Data analysis. The challenges for the statisticians in coping with Big Data were also described in this article, with particular emphasis on computational skills in data acquisition (knowledge of programming languages, knowledge of web and core communication protocols), data processing (skills to transform voice or image data to numeric data using appropriate software or programming), data management (knowledge about database management tools and technologies, such as NoSQL) and scalable computation (knowledge about parallel computing, which can be implemented using MapReduce, SQL etc.).

As indicated above, many of the papers provide a summary of the published literature which is not replicated here. Some of these reviews are based on large thematic programs that have been held on this topic. For example, the paper by [67] is based on presentations and discussions held as part of the program on Statistical Inference, Learning and Models for Big Data which was held in Canada in 2015. The authors discussed the four V’s (volume, variety, veracity and velocity) of Big Data and mentioned some more challenges in Big Data analysis which are beyond the complexities associated with the four V’s. The additional “V" mentioned in this article is veracity. Veracity refers to biases and noise in the data which may be the result of the heterogeneous structure of the data sources, which may make the sample non representative of the population. Veracity in Big Data is often referred to as the biggest challenge compared with the other V’s. The paper reviewed the common strategies for Big Data analysis starting from data wrangling which consists of data manipulation techniques for making the data eligible for analysis; visualisation which is often an important tool to understand the underlying patterns in the data and is the first formal step in data analysis; reducing the dimension of data using different algorithms such as Principal Component Analysis (PCA) to make Big Data models tractable and interpretable; making models more robust by enforcing sparsity in the model by the use of regularisation techniques such as variable selection and model fitting criteria; using optimisation methods based on different distance measures proposed for high dimensional data and by using different learning algorithms such as representation learning and sequential learning. Different applications of Big Data were shown in public health, health policy, law and order, education, mobile application security, image recognition and labelling, digital humanities and materials science.

There are few other research articles focused on statistical methods tailored to specific problems, which are not included in Table 2. For example, [40] proposed a statistics-based algorithm using a stochastic space-time model with more than 1 billion data points to reproduce some features of a climate model. Similarly, [123] used various statistical methods to obtain associations between drug-outcome pairs in a very big longitudinal medical experimental database (with information on millions of patients) with a detailed discussion on the big results problem by providing a comparison of statistical and machine learning approaches. Finally, [84] proposed stochastic variational inference for Gaussian processes which makes the application of Gaussian process to huge data sets (having millions of data points).

From the review of some relevant literature related to statistical perspectives for analysing Big Data, it can be seen that along with scaling up existing algorithms, new methodological developments are also in progress in order to face the challenges associated with Big Data.

5 Bayesian Approaches in Big Data

As described in the Introduction, the intention of this review is to commence with a broad scope of the literature on Big Data, then focus on statistical methods for Big Data, and finally to focus in particular on Bayesian approaches for modelling and analysis of Big Data. This section consists of a review of published literature on the last of these.

There are two defining features of Bayesian analysis: (i) the construction of the model and associated parameters and expectations of interest, and (ii) the development of an algorithm to obtain posterior estimates of these quantities. In the context of Big Data, the resultant models can become complex and suffer from issues such as unavailability of a likelihood, hierarchical instability, parameter explosion and identifiability. Similarly, the algorithms can suffer from too much or too little data given the model structure, as well as problems of scalability and cost. These issues have motivated the development of new model structures, new methods that avoid the need for models, new Markov chain Monte Carlo (MCMC) sampling methods, and alternative algorithms and approximations that avoid these simulation-based approaches. We discuss some of the concomitant literature under two broad headings, namely computation and models realising that there is often overlap in cited papers.

5.1 Bayesian Computation

In Bayesian framework a main-stream computational tool has been the Markov chain Monte Carlo (MCMC). The traditional MCMC methods do not scale well because they need to iterate through the full data set at each iteration to evaluate the likelihood [198]. Recently several attempts have been made to scale MCMC methods up to massive data. A widely used strategy to overcome the computational cost is to distribute the computational burden across a number of machines. The strategy is generally referred to as divide-and-conquer sampling. This approach breaks a massive data set into a number of easier to handle subsets, obtains posterior samples based on each subset in parallel using multiple machines and finally combines the subset posterior inferences to obtain the full-posterior estimates [168]. The core challenge is the recombination of sub-posterior samples to obtain true posterior samples. A number of attempts have been made to address this challenge.

[134] and [194]

approximated the sub-posteriors using kernel density estimation and then aggregated the sub-posteriors by taking their product. Both algorithms provided consistent estimates of the posterior.

[134] provided faster MCMC processing since it allowed the machine to process the parallel MCMC chains independently. However, one limitation of the asymptotically embarrassing parallel MCMC algorithm [134] is that it only works for real and unconstrained posterior values, so there is still scope of works to make the algorithm work under more general settings.

[190] adopted a similar approach of parallel MCMC but used a Weierstrass transform to approximate the sub-posterior densities instead of a kernel density estimate. This provided better approximation accuracy, chain mixing rate and potentially faster speed for large scale Bayesian analysis.

[162] partitioned the data at random and performed MCMC independently on each subset to draw samples from posterior given the data subset. To obtain consensus posteriors they proposed to average samples across subsets and showed the exactness of the algorithm under a Gaussian assumption. This algorithm is scalable to a very large number of machines and works in cluster, single multi core or multiprocessor computers or any arbitrary collection of computers linked by a high speed network. The key weakness of consesnsous MCMC is it does not apply to non Gaussian posterior.

[128] proposed dividing a large set of independent data into a number of non-overlapping subsets, making inferences on the subsets in parallel and then combining the inferences using the median of the subset posteriors. The median posterior (M-posterior) is constructed from the subset posteriors using Weiszfeld’s algorithm, which provides a scalable algortihm for robust estimation .

[77] extended this notion to spatially dependent data, provided a scalable divide and conquer algorithm to analyse big spatial data sets named spatial meta kriging. The multivariate extension of spatial meta kriging has been addressed by [78]. These approaches of meta kriging are practical developments for Bayesian spatial inference for Big Data, specifically with “big-N" problems [98].

[198] proposed a new and flexible divide and conquer framework by using re-scaled sub-posteriors to approximate the overall posterior. Unlike other parallel approaches of MCMC, this method creates artificial data for each subset, and applies the overall priors on the artificial data sets to get the subset posteriors. The sub-posteriors are then re-centred to their common mean and then averaged to approximate the overall posterior. The authors claimed this method to have statistical justification as well as mathematical validity along with sharing same computational cost with other classical parallel MCMC approaches such as consensus Monte Carlo, Weierstrass sampler. [30]

proposed a non-reversible rejection-free MCMC method, which reportedly outperforms state-of-the-art methods such as: HMC, Firefly by having faster mixing rate and lower variances for the estimators for high dimensional models and large data sets. However, the automation of this method is still a challenge.

Another strategy for scalable Bayesian inference is the sub-sampling based approach. In this approach, a smaller subset of data is queried in the MCMC algorithm to evaluate the likelihood at every iteration.

[116] proposed to use an auxiliary variable MCMC algorithm that evaluates the likelihood based on a small subset of the data at each iteration yet simulates from the exact posterior distribution. To improve the mixing speed, [95] used an approximate Metropolis Hastings (MH) test based on a subset of data. A similar approach is used in [17], where the accept/reject step of MH evaluates the likelihood of a random subset of the data. [18] extended this approach by replacing a number of likelihood evaluations by a Taylor expansion centred at the maximum of the likelihood and concluded that their method outperforms the previous algorithms [95].

The scalable MCMC approach was also improved by [150] using a difference estimator to estimate the log of the likelihood accurately using only a small fraction of the data. [149]

introduced an unbiased estimator of the log likelihood based on weighted sub-sample which is used in the MH acceptance step in speeding up based on a weighted MCMC efficiently. Another scalable adaptation of MH algorithm was proposed by

[119]

to speed up Bayesian inference in Big Data namely informed subsampling MCMC which involves drawing of subsets according to a similarity measure (i.e., squared L2 distance between full data and maximum likelihood estimators of subsample) instead of using uniform distribution. The algorithm showed excellent performance in the case of a limited computational budget by approximating the posterior for a tall dataset.

Another variation of MCMC in Big Data has been made by [169]. These authors approximated the posterior expectation by a novel Bayesian inference framework for approximating the posterior expectation from a different perspective suitable for Big Data problems, which involves paths of partial posteriors. This is a parallelisable method which can easily be implemented using existing MCMC techniques. It does not require the simulation from full posterior, thus bypassing the complex convergence issues of kernel approximation. However, there is still scope for future work to look at computation-variance trade off and finite time bias produced by MCMC.

Hamiltonian Monte Carlo (HMC) sampling methods provide powerful and efficient algorithms for MCMC using high acceptance probabilities for distant proposals [45]. A conceptual introduction to HMC is presented by [25]. [45] proposed a stochastic gradient HMC using second-order Langevin dynamics. Stochastic Gradient Langevin Dynamics (SGLD) have been proposed as a useful method for applying MCMC to Big Data where the accept-reject step is skipped and decreasing step size sequences are used [1]. For more detailed and rigorous mathematical framework, algorithms and recommendations, interested readers are referred to [177].

A popular method of scaling Bayesian inference, particularly in the case of analytically intractable distributions, is Sequential Monte Carlo (SMC) or particle filters [48, 24, 80]. SMC algorithms have recently become popular as a method to approximate integrals. The reasons behind their popularity include their easy implementation and parallelisation ability, much needed characteristics in Big Data implementations [100]

. SMC can approximate a sequence of probability distributions on a sequence of spaces with an increasing dimension by applying resampling, propagation and weighting starting with the prior and eventually reaching to the posterior of interest of the cloud of particles.

[80] proposed a sub-sampling SMC which is suitable for parallel computation in Big Data analysis, comprising two steps. First, the speed of the SMC is increased by using an unbiased and efficient estimator of the likelihood, followed by a Metropolis within Gibbs kernel. The kernel is updated by a HMC method for model parameters and a block-pseudo marginal proposal for the auxiliary variables [80]. Some novel approaches of SMC include: divide-and-conquer SMC [105], multilevel SMC [24], online SMC [75] and one pass SMC [104], among others.

Stochastic variational inference (VI, also called Variational Bayes, VB) is a faster alternative to MCMC [88]. It approximates probability densities using a deterministic optimisation method [110] and has seen widespread use to approximate posterior densities for Bayesian models in large-scale problems. The interested reader is referred to [29] for a detailed introduction to variational inference designed for statisticians, with applications. VI has been implemented in scaling up algorithms for Big Data. For example, a novel re-parameterisation of VI has been implemented for scaling latent variable models and sparse GP regression to Big Data [69].

There have been studies which combined the VI and SMC in order to take advantage from both strategies in finding the true posterior [56, 133, 151]. [133] employed a SMC approach to get an improved variational approximation, [151] by splitting the data into block, applied SMC to compute partial posterior for each block and used a variational argument to get a proxy for the true posterior by the product of the partial posteriors. The combination of these two techniques in a Big Data context was made by [56]. [56] proposed a new sampling scheme called Shortened Bridge Sampler, which combines the strength of deterministic approximations of the posterior that is variational Bayes with those of SMC. This sampler resulted in reduced computational time for Big Data with huge numbers of parameters, such as data from genomics or network.

[79] proposed a novel algorithm for Bayesian inference in the context of massive online streaming data, extending the Gibbs sampling mechanism for drawing samples from conditional distributions conditioned on sequential point estimates of other parameters. The authors compared the performance of this conditional density filtering algorithm in approximating the true posterior with SMC and VB, and reported good performance and strong convergence of the proposed algorithm.

Approximate Bayesian computation (ABC) is gaining popularity for statistical inference with high dimensional data and computationally intensive models where the likelihood is intractable [125]. A detailed overview of ABC can be found in [166] and asymptotic properties of ABC are explored in [68]. ABC is a likelihood free method that approximates the posterior distribution utilising imperfect matching of summary statistics [166]. Improvements on existing ABC methods for efficient estimation of posterior density with Big Data (complex and high dimensional data with costly simulations) have been proposed by [90]. The choice of summary statistics from high dimensional data is a topic of active discussion; see, for example, [90, 165]. [147] provided a reliable and robust method of model selection in ABC employing random forests which was shown to have a gain in computational efficiency.

There is another aspect of ABC recently in terms of approximating the likelihood using Bayesian Synthetic likelihood or empirical likelihood [59]. Bayesian synthetic likelihood arguably provides computationally efficient approximations of the likelihood with high dimensional summary statistics [126, 195]

. Empirical likelihood, on the other hand is a non-parametric technique of approximating the likelihood empirically from the data considering the moment constraints; this has been suggested in the context of ABC

[127], but has not been widely adopted. For further reading on empirical likelihood, see [141].

Classification and regression trees are also very useful tools in data mining and Big Data analysis [33]. There are Bayesian versions of regression trees such as Bayesian Additive Regression Trees (BART) [47, 93, 7]. The BART algorithm has also been applied to the Big Data context and sparse variable selection by [156, 180, 106].

Some other recommendations to speed up computations are to use graphics processing units [101, 170] and parallel programming approaches [76, 42, 196, 71].

5.2 Bayesian Modelling

The extensive development of Bayesian computational solutions has opened the door to further developments in Bayesian modelling. Many of these new methods are set in the context of application areas. For example, there have been applications of ABC for Big Data in many different fields [62, 102]. For example, [62] developed a high performance computing ABC approach for estimation of parameters in platelets deposition, while [102] proposed ABC methods for inference in high dimensional multivariate spatial data from a large number of locations with a particular focus on model selection for application to spatial extremes analysis. Bayesian mixtures are a popular modelling tool. VB and ABC techniques have been used for fitting Bayesian mixture models to Big Data [124, 176, 88, 29, 129].

Variable selection in Big Data (wide in particular, having massive number of variables) is a demanding problem. [107] proposed multivariate extensions of the Bayesian group lasso for variable selection in high dimensional data using Bayesian hierarchical models utilising spike and slab priors with application to gene expression data. The variable selection problem can also be solved employing ABC type algorithms. [111] proposed a sampling technique, ABC Bayesian forests, based on splitting the data, useful for high dimensional wide data, which turns out to be a robust method in identifying variables with larger marginal inclusion probability.

Bayesian non-parametrics [132]

have unbounded capacity to adjust unseen data through activating additional parameters that were inactive before the emergence of new data. In other words, the new data are allowed to speak for themselves in non-parametric models rather than imposing an arguably restricted model (that was learned on an available data) to accommodate new data. The inherent flexibility of these models to adjust with new data by adapting in complexity makes them more suitable for Big Data as compared to their parametric counterparts. For a brief introduction to Bayesian non-parametric models and a nontechnical overview of some of the main tools in the area, the interested reader is referred to

[74].

The popular tools in Bayesian non-parametrics include Gaussian processes (GP) [155], Dirichlet processes (DP) [154], Indian buffet process (IBP) [73]

and infinite hidden Markov models (iHMM)

[20]. GP have been used for a variety of applications [41, 49, 35] and attempts have been made to scale it to Big Data [84, 85, 178, 53]. DP have seen successes in clustering and faster computational algorithms are being adopted to scale them to Big Data [185, 188, 104, 115, 71]. IBP are used for latent feature modeling, where the number of features are determined in a data-driven fashion and have been scaled to Big Data through variational inference algorithms [210]. Being an alternative to classical HMM, one of the distinctive properties of iHMM is that it infers the number of hidden states in the system from the available data and has been scaled to Big Data using particle filtering algorithms [179].

Gaussian Processes are also employed in the analysis of high dimensional spatially dependent data [15]. [15] provided model-based solutions employing low rank GP and nearest neighbour GP (NNGP) as scalable priors in a hierarchical framework to render full Bayesian inference for big spatial or spatio temporal data sets. [203] extended the applicability of NNGP for inference of latent spatially dependent processes by developing a conjugate latent NNGP model as a practical alternative to onerous Bayesian computations. Use of variational optimisation with structured Bayesian GP latent variable model to analyse spatially dependent data is made in [12]. For a review of methods of analysis of massive spatially dependent data including the Bayesian approaches, see [83].

Another Bayesian modelling approach that has been used for big and complex data is Bayesian Networks (BN). This methodology has generated a substantial literature examining theoretical, methodological and computational approaches, as well as applications [175]. BN belong to the family of probabilistic graphical models and based on direct acyclic graphs which are very useful representation of causal relationship among variables [23]. BN are used as efficient learning tool in Big Data analysis integrated with scalable algorithms [187, 208]. For a more detailed understanding of BN learning from Big Data, please see [175].

Classification is also an important tool for extracting information from Big Data and Bayesian classifiers, including Naive Bayes classifier (NBC) are used in Big Data classification problems [94, 108]. Parallel implementation of NBC has been proposed by [94]. Moreover, [108] evaluated the scalability of NBC in Big Data with application to sentiment classification of millions of movie review and found NBC to have improved accuracy in Big Data. [135] proposed a scalable multi step clustering and classification algorithm using Bayesian nonparametrics for Big Data with large n and small p which can also run in parallel.

The past fifteen years has also seen an increase in interest in Empirical Likelihood (EL) for Bayesian modelling. The idea of replacing the likelihood with an empirical analogue in a Bayesian framework was first explored in detail by [99]. The author demonstrated that this Bayesian Empirical Likelihood (BEL) approach increases the flexibility of EL approach by examining the length and coverage of BEL intervals. The paper tested the methods using simulated data sets. Later, [160] provided probabilistic interpretations of BEL exploring moment condition models with EL and provided a non parametric version of BEL, namely Bayesian Exponentially Tilted Empirical Likelihood (BETEL). The BEL methods have been applied in spatial data analysis in [43] and [145, 146] for small area estimation.

We acknowledge that there are many more studies on the application of Bayesian approaches in different fields of interest which are not included in this review. There are also other review papers on overlapping and closely related topics. For example, [209] describes Bayesian methods of machine learning and includes some of the Bayesian inference techniques reviewed in the present study. However, the scope and focus of this review is different from that of [209], which was focused around the methods applicable to machine learning.

6 Conclusions

We are living in the era of Big Data and continuous research is in progress to make most use of the available information. The current chapter has attempted to review the recent developments made in Bayesian statistical approaches for handling Big Data along with a general overview and classification of the Big Data literature with brief review in last 5 years. This review chapter provides relevant references in Big Data categorised in finer classes, a brief description of statistical contributions to the field and a more detailed discussion of the Bayesian approaches developed and applied in the context of Big Data.

On the basis of the reviews made above, it is clear that there has been a huge amount of work on issues related to cloud computing, analytics infrastructure and so on. However, the amount of research conducted from statistical perspectives is also notable. In the last five years, there has been an exponential increase in published studies focused on developing new statistical methods and algorithms, as well as scaling existing methods. These have been summarised in Section 4, with particular focus on Bayesian approaches in Section 5. In some instances citations are made outside of the specific period (see section 2) to refer the origin of the methods which are currently being applied or extended in Big Data scenarios.

With the advent of computational infrastructure and advances in programming and software, Bayesian approaches are no longer considered as being very computationally expensive and onerous to execute for large volumes of data, that is Big Data. Traditional Bayesian methods are now becoming much more scalable due to the advent of parallelisation of MCMC algorithms, divide and conquer and/or sub-sampling methods in MCMC, and advances in approximations such as HMC, SMC, ABC, VB and so on. With the increasing volume of data, non-parametric Bayesian methods are also gaining in popularity.

This review chapter aimed to review a range of methodological and computational advancement made in Bayesian Statistics for handling the difficulties arose by the advent of Big Data. By not focusing to any particular application, this chapter provided the readers with a general overview of the developments of Bayesian methodologies and computational algorithms for handling these issues. The review has revealed that most of the advancements in Bayesian Statistics for Big Data have been around computational time and scalability of particular algorithms, concentrating on estimating the posterior by adopting different techniques. However the developments of Bayesian methods and models for Big Data in the recent literature cannot be overlooked. There are still many open problems for further research in the context of Big Data and Bayesian approaches, as highlighted in this chapter.

Based on the above discussion and the accompanying review presented in this chapter, it is apparent that to address the challenges of Big Data along with the strength of Bayesian statistics, research on both algorithms and models are essential.

References

  • [1] S. Ahn, B. Shahbaba, and M. Welling (2014) Distributed stochastic gradient MCMC. In International Conference on Machine Learning, pp. 1044–1052. Cited by: §5.1.
  • [2] S. Akter and S. F. Wamba (2016) Big data analytics in e-commerce: a systematic review and agenda for future research. Electronic Markets 26 (2), pp. 173–194. Cited by: 3rd item, Table 1.
  • [3] A. Akusok, K. Björk, Y. Miche, and A. Lendasse (2015) High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3, pp. 1011–1025. Cited by: 2nd item, Table 1.
  • [4] O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat, G. K. Karagiannidis, and K. Taha (2015) Efficient machine learning for big data: a review. Big Data Res. 2 (3), pp. 87–93. Cited by: 1st item, Table 1.
  • [5] K. Albury, J. Burgess, B. Light, K. Race, and R. Wilken (2017) Data cultures of mobile dating and hook-up apps: emerging issues for critical social science research. Big Data Soc. 4 (2), pp. 1–11. Cited by: 1st item, Table 1.
  • [6] G. I. Allen, L. Grosenick, and J. Taylor (2014) A generalized least-square matrix decomposition. J Am Stat Assoc 109 (505), pp. 145–159. Cited by: Table 3.
  • [7] G. M. Allenby, E. T. Bradlow, E. I. George, J. Liechty, and R. E. McCulloch (2014) Perspectives on Bayesian methods and big data. Cust. Needs and Solut. 1 (3), pp. 169–175. Cited by: Table 1, §5.1.
  • [8] S. G. Alonso, I. de la Torre Díez, J. J. Rodrigues, S. Hamrioui, and M. López-Coronado (2017) A systematic review of techniques and sources of big data in the healthcare sector. J Med Syst 41 (11), pp. 183. Cited by: Table 1.
  • [9] A. Alyass, M. Turcotte, and D. Meyre (2015) From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genom. 8 (1), pp. 33. Cited by: 2nd item, Table 1.
  • [10] D. Apiletti, E. Baralis, T. Cerquitelli, P. Garza, F. Pulvirenti, and L. Venturini (2017) Frequent itemsets mining for big data: a comparative analysis. Big Data Res. 9, pp. 67–83. Cited by: 2nd item, Table 1.
  • [11] M. D. Assunção, R. N. Calheiros, S. Bianchi, M. A. Netto, and R. Buyya (2015) Big data computing and clouds: trends and future directions. Journal of Parallel and Distributed Comput. 79, pp. 3–15. Cited by: 1st item, Table 1.
  • [12] S. Atkinson and N. Zabaras (2019) Structured Bayesian Gaussian process latent variable model: applications to data-driven dimensionality reduction and high-dimensional inversion. J Comput Phys 383, pp. 166–195. Cited by: §5.2.
  • [13] A. T. Azar and A. E. Hassanien (2015) Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft Comput. 19 (4), pp. 1115–1127. Cited by: 3rd item.
  • [14] A. Baldominos, E. Albacete, Y. Saez, and P. Isasi (2014) A scalable machine learning online service for big data real-time analysis. In Computational Intelligence in Big Data (CIBD), 2014 IEEE Symposium on, pp. 1–8. Cited by: 2nd item.
  • [15] S. Banerjee (2017) High-dimensional Bayesian geostatistics. Bayesian Anal. 12 (2), pp. 583. Cited by: §5.2.
  • [16] S. Bansal, G. Chowell, L. Simonsen, A. Vespignani, and C. Viboud (2016) Big data for infectious disease surveillance and modeling. J. Infect. Dis. 214 (suppl_4), pp. S375–S379. Cited by: 2nd item, Table 1.
  • [17] R. Bardenet, A. Doucet, and C. Holmes (2014) Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In International Conference on Machine Learning (ICML), pp. 405–413. Cited by: §5.1.
  • [18] R. Bardenet, A. Doucet, and C. Holmes (2017) On Markov chain Monte Carlo methods for tall data. J Mach Learn Res 18 (1), pp. 1515–1557. Cited by: §5.1.
  • [19] D. W. Bates, S. Saria, L. Ohno-Machado, A. Shah, and G. Escobar (2014) Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 33 (7), pp. 1123–1131. Cited by: 2nd item, Table 1.
  • [20] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen (2002) The infinite hidden markov model. In Advances in Neural Information Processing Systems, pp. 577–584. Cited by: §5.2.
  • [21] A. Belle, R. Thiagarajan, S. Soroushmehr, F. Navidi, D. A. Beard, and K. Najarian (2015) Big data analytics in healthcare. BioMed Res. Int. 2015. Cited by: Table 1.
  • [22] G. Bello-Orgaz, J. J. Jung, and D. Camacho (2016) Social big data: recent achievements and new challenges. Inf. Fus. 28, pp. 45–59. Cited by: §1, Table 1.
  • [23] I. Ben-Gal (2008) Bayesian Networks. Encycl. Stat. Qual. Reliab. 1, pp. 1–6. Cited by: §5.2.
  • [24] A. Beskos, A. Jasra, E. A. Muzaffer, and A. M. Stuart (2015) Sequential Monte Carlo methods for Bayesian elliptic inverse problems. Stat. Comput. 25 (4), pp. 727–737. Cited by: §5.1.
  • [25] M. Betancourt (2017) A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv: 1701.02434. Cited by: §5.1.
  • [26] J. Bibault, P. Giraud, and A. Burgun (2016) Big data and machine learning in radiation oncology: state of the art and future prospects. Cancer Lett. 382 (1), pp. 110–117. Cited by: 3rd item, Table 1.
  • [27] A. Bifet and G. D. F. Morales (2014) Big data stream learning with Samoa. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1199–1202. Cited by: Table 1.
  • [28] H. Binder and M. Blettner (2015) Big data in medical science—a biostatistical view: part 21 of a series on evaluation of scientific publications. Dtsch. Ärztebl. Int. 112 (9), pp. 137. Cited by: 2nd item, Table 1.
  • [29] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112 (518), pp. 859–877. Cited by: §5.1, §5.2.
  • [30] A. Bouchard-Côté, S. J. Vollmer, and A. Doucet (2018) The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J Am Stat Assoc, pp. 1–13. Cited by: §5.1.
  • [31] E. T. Bradlow, M. Gangwar, P. Kopalle, and S. Voleti (2017) The role of big data and predictive analytics in retail.. Journal of Retailing 93 (1), pp. 79–95. Cited by: 3rd item, Table 1.
  • [32] R. Branch, H. Tjeerdsma, C. Wilson, R. Hurley, and S. McConnell (2014) Cloud computing and big data: a review of current service models and hardware perspectives. J Softw. Eng. Appl. 7 (08), pp. 686. Cited by: 2nd item, Table 1.
  • [33] L. Breiman (2017) Classification and Regression Trees. Routledge. Cited by: §5.1.
  • [34] P. F. Brennan and S. Bakken (2015) Nursing needs big data and big data needs nursing. J Nurs. Scholarsh. 47 (5), pp. 477–484. Cited by: Table 1.
  • [35] F. Buettner, K. N. Natarajan, F. P. Casale, V. Proserpio, A. Scialdone, F. J. Theis, S. A. Teichmann, J. C. Marioni, and O. Stegle (2015) Computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33 (2), pp. 155. Cited by: §5.2.
  • [36] J. Bughin (2016) Big data, big bang?. Journal of Big Data 3 (1), pp. 2. Cited by: Table 1.
  • [37] R. Burrows and M. Savage (2014) After the crisis? Big data and the methodological challenges of empirical sociology. Big Data Soc. 1 (1), pp. 1–6. Cited by: 1st item, Table 1.
  • [38] H. Cai, B. Xu, L. Jiang, and A. V. Vasilakos (2017) IoT-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet Things J 4 (1), pp. 75–87. Cited by: Table 1.
  • [39] J. N. Cappella (2017) Vectors into the future of mass and interpersonal communication research: big data, social media, and computational social science. Hum. Commun. Res. 43 (4), pp. 545–558. Cited by: 1st item, Table 1.
  • [40] S. Castruccio and M. G. Genton (2016) Compressing an ensemble with statistical models: an algorithm for global 3d spatio-temporal temperature. Technometrics 58 (3), pp. 319–328. Cited by: §4.
  • [41] K. Chalupka, C. K. Williams, and I. Murray (2013) A framework for evaluating approximation methods for gaussian process regression. J Mach. Learn. Res. 14 (Feb), pp. 333–350. Cited by: §5.2.
  • [42] J. Chang and J. W. Fisher III (2013) Parallel sampling of DP mixture models using sub-cluster splits. In Advances in Neural Information Processing Systems, pp. 620–628. Cited by: §5.1.
  • [43] S. Chaudhuri and M. Ghosh (2011) Empirical likelihood for small area estimation. Biometrika, pp. 473–480. Cited by: §5.2.
  • [44] Chen, E. E. Chen, W. Zhao, and W. Zou (2015) Statistics in big data. J Chin. Stat. Assoc. 53, pp. 186–202. Cited by: Table 1, Table 3, §4.
  • [45] T. Chen, E. Fox, and C. Guestrin (2014) Stochastic gradient Hamiltonian Monte Carlo. In Int. Conference on Machine Learning, pp. 1683–1691. Cited by: Table 1, §5.1.
  • [46] A. S. Cheung (2018) Moving beyond consent for citizen science in big data health and medical research. Northwest. J Technol. Intellect. Prop. 16 (1), pp. 15. Cited by: 2nd item, Table 1.
  • [47] H. A. Chipman, E. I. George, R. E. McCulloch, et al. (2010) BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 (1), pp. 266–298. Cited by: §5.1.
  • [48] N. Chopin, P. E. Jacob, and O. Papaspiliopoulos (2013) SMC2: an efficient algorithm for sequential analysis of state space models. JRoyal Stat. Soc. Ser. B (Stat. Methodol.) 75 (3), pp. 397–426. Cited by: §5.1.
  • [49] A. Damianou and N. Lawrence (2013) Deep Gaussian processes. In Artificial Intelligence and Statistics, pp. 207–215. Cited by: §5.2.
  • [50] T. Das and P. M. Kumar (2013) Big data analytics: a framework for unstructured data analysis. Int. J Eng. Sci. Technol. 5 (1), pp. 153. Cited by: Table 1.
  • [51] A. De Mauro, M. Greco, and M. Grimaldi (2015) What is big data? a consensual definition and a review of key research topics. In AIP Conference Proceedings, Vol. 1644 (1), pp. 97–104. Cited by: §1, Table 1.
  • [52] A. De Mauro, M. Greco, and M. Grimaldi (2016) A formal definition of big data based on its essential features. Libr. Rev. 65 (3), pp. 122–135. Cited by: §1, 1st item, 2nd item, Table 1.
  • [53] M. P. Deisenroth and J. W. Ng (2015) Distributed Gaussian processes. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, pp. 1481–1490. Cited by: §5.2.
  • [54] H. Demirkan and D. Delen (2013) Leveraging the capabilities of service-oriented decision support systems: putting analytics and big data in cloud. Decis. Support Syst. 55 (1), pp. 412–421. Cited by: Table 1.
  • [55] K. S. Divya, P. Bhargavi, and S. Jyothi (2018) Machine learning algorithms in big data analytics. Int. J Comput. Sci. Eng. 6 (1), pp. 63–70. Cited by: Table 1.
  • [56] S. Donnet and S. Robin (2017) Shortened Bridge Sampler: Using deterministic approximations to accelerate SMC for posterior sampling. arXiv preprint arXiv 1707.07971. Cited by: §5.1.
  • [57] J. A. Doornik and D. F. Hendry (2015) Statistical model selection with “big data”. Cogent Econ. Fin. 3 (1), pp. 1045216. Cited by: Table 1, Table 3.
  • [58] J. A. Doornik (2009) Autometrics. In in Honour of David F. Hendry, pp. 88–121. Cited by: 1st item.
  • [59] C. C. Drovandi, C. Grazian, K. Mengersen, and C. Robert (2018) Approximating the likelihood in ABC. In Handbook of Approximate Bayesian Computation, pp. 321–368. Cited by: §5.1.
  • [60] P. Ducange, R. Pecori, and P. Mezzina (2018) A glimpse on big data analytics in the framework of marketing strategies. Soft Comput. 22 (1), pp. 325–342. Cited by: 3rd item, Table 1.
  • [61] D. B. Dunson (2018) Statistics in the big data era: failures of the machine. Stat. Probab. Lett. 136, pp. 4–9. Cited by: Table 1, Table 3, §4.
  • [62] R. Dutta, M. Schoengens, J. Onnela, and A. Mira (2017) ABCpy: a user-friendly, extensible, and parallel library for approximate bayesian computation. In Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–9. Cited by: §5.2.
  • [63] C. K. Emani, N. Cullot, and C. Nicolle (2015) Understandable big data: a survey. Comput. Sci. Rev. 17, pp. 70–81. Cited by: 2nd item, Table 1.
  • [64] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2 (3), pp. 267–279. Cited by: Table 1.
  • [65] J. Fan, F. Han, and H. Liu (2014) Challenges of big data analysis. Natl. Sci. Rev. 1 (2), pp. 293–314. Cited by: 2nd item, Table 1.
  • [66] S. Fosso Wamba and D. Mishra (2017) Big data integration with business processes: a literature review. Bus. Process Manag. J 23 (3), pp. 477–492. Cited by: Table 1.
  • [67] B. Franke, J. Plante, R. Roscher, E. A. Lee, C. Smyth, A. Hatefi, F. Chen, E. Gil, A. Schwing, A. Selvitella, et al. (2016) Statistical inference, learning and models in big data. Int. Stat. Rev. 84 (3), pp. 371–389. Cited by: Table 1, Table 3, §4, §4.
  • [68] D. T. Frazier, G. M. Martin, C. P. Robert, and J. Rousseau (2018) Asymptotic properties of approximate Bayesian computation. Biometrika 00 (0), pp. 1–15. Cited by: §5.1.
  • [69] Y. Gal, M. Van Der Wilk, and C. E. Rasmussen (2014) Distributed variational inference in sparse Gaussian process regression and latent variable models. In Advances in neural information processing systems, pp. 3257–3265. Cited by: §5.1.
  • [70] A. Gandomi and M. Haider (2015) Beyond the hype: big data concepts, methods, and analytics. Int. J Inf. Manag. 35 (2), pp. 137–144. Cited by: §1, Table 1.
  • [71] H. Ge, Y. Chen, M. Wan, and Z. Ghahramani (2015) Distributed inference for dirichlet process mixture models. In International Conference on Machine Learning, pp. 2276–2284. Cited by: §5.1, §5.2.
  • [72] R. Genuer, J. Poggi, C. Tuleau-Malot, and N. Villa-Vialaneix (2017) Random forests for big data. Big Data Res. 9, pp. 28–46. Cited by: Table 3.
  • [73] Z. Ghahramani and T. L. Griffiths (2006) Infinite latent feature models and the indian buffet process. In Advances in neural information processing systems, pp. 475–482. Cited by: §5.2.
  • [74] Z. Ghahramani (2013) Bayesian non-parametrics and the probabilistic approach to modelling. Phil. Trans. R. Soc. A 371 (1984), pp. 20110553. Cited by: §5.2.
  • [75] P. Gloaguen, M. Etienne, and S. Le Corff (2018) Online sequential monte carlo smoother for partially observed diffusion processes. URASIP J Adv Signal Process 2018 (1), pp. 9. Cited by: §5.1.
  • [76] S. Guha, R. Hafen, J. Rounds, J. Xia, J. Li, B. Xi, and W. S. Cleveland (2012) Large complex data: divide and recombine (d&r) with rhipe. Stat 1 (1), pp. 53–67. Cited by: §5.1.
  • [77] R. Guhaniyogi and S. Banerjee (2018-06) Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets. Technometrics 60 (4), pp. 430–444. Cited by: §5.1.
  • [78] R. Guhaniyogi and S. Banerjee (2019) Multivariate spatial meta kriging. Stat. Probab. Lett. 144, pp. 3–8. Cited by: §5.1.
  • [79] R. Guhaniyogi, S. Qamar, and D. B. Dunson (2014) Bayesian conditional density filtering for big data. Stat 1050, pp. 15. Cited by: §5.1.
  • [80] D. Gunawan, R. Kohn, M. Quiroz, K. Dang, and M. Tran (2018) Subsampling Sequential Monte Carlo for Static Bayesian Models. arXiv Preprint arXiv:1805.03317. Cited by: Table 1, §5.1.
  • [81] H. Hassani and E. S. Silva (2015) Forecasting with big data: a review. Ann. of Data Sci. 2 (1), pp. 5–19. Cited by: Table 1.
  • [82] S. I. Hay, D. B. George, C. L. Moyes, and J. S. Brownstein (2013) Big data opportunities for global infectious disease surveillance. PLoS Med. 10 (4), pp. e1001413. Cited by: 2nd item, Table 1.
  • [83] M. J. Heaton, A. Datta, A. Finley, R. Furrer, R. Guhaniyogi, F. Gerber, R. B. Gramacy, D. Hammerling, M. Katzfuss, F. Lindgren, et al. (2017) Methods for analyzing large spatial data: a review and comparison. arXiv preprint arXiv:1710.05013. Cited by: §5.2.
  • [84] J. Hensman, N. Fusi, and N. D. Lawrence (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835. Cited by: Table 1, §4, §5.2.
  • [85] J. Hensman, A. G. d. G. Matthews, and Z. Ghahramani (2015) Scalable variational Gaussian process classification. In Artificial Intelligence and Statistics (AISTATS), 18th International Conference on, pp. 351–360. Cited by: §5.2.
  • [86] M. Hilbert (2016) Big data for development: a review of promises and challenges. Dev. Policy Rev. 34 (1), pp. 135–174. Cited by: Table 1, §4.
  • [87] R. W. Hoerl, R. D. Snee, and R. D. De Veaux (2014) Applying statistical thinking to ‘Big Data’ problems. Wiley Interdisci. Rev.: Comput. Stat. 6 (4), pp. 222–232. Cited by: Table 1, Table 3, §4.
  • [88] M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley (2013) Stochastic variational inference. J Mach. Learn. Res. 14 (1), pp. 1303–1347. Cited by: §5.1, §5.2.
  • [89] H. H. Huang and H. Liu (2014) Big data machine learning and graph analytics: current state and future challenges. In 2014 IEEE International Conference on Big Data (Big Data), pp. 16–17. Cited by: Table 1.
  • [90] R. Izbicki, A. B. Lee, and T. Pospisil (2019) ABC–CDE: Toward Approximate Bayesian Computation With Complex High-Dimensional Data and Limited Simulations. J Comput Graph Stat, pp. 1–20. Cited by: §5.1.
  • [91] G. Jifa and Z. Lingling (2014)

    Data, dikw, big data and data science

    .
    Procedia Comput. Sci. 31, pp. 814–821. Cited by: §1.
  • [92] S. Kaisler, F. Armour, J. A. Espinosa, and W. Money (2013) Big data: issues and challenges moving forward. In 2013 46th Hawaii International Conference on System Sciences, pp. 995–1004. Cited by: §1.
  • [93] A. Kapelner and J. Bleich (2013) bartMachine: Machine learning with Bayesian additive regression trees. arXiv preprint arXiv:1312.2171. Cited by: §5.1.
  • [94] V. D. Katkar and S. V. Kulkarni (2013) A novel parallel implementation of naive bayesian classifier for big data. In Green Computing, Communication and Conservation of Energy (ICGCE), 2013 International Conference on, pp. 847–852. Cited by: §5.2.
  • [95] A. Korattikara, Y. Chen, and M. Welling (2014) Austerity in MCMC land: cutting the metropolis-hastings budget. In International Conference on Machine Learning, pp. 181–189. Cited by: §5.1.
  • [96] H. Kousar and B. P. Babu (2018) Multi-Agent based MapReduce Model for Efficient Utilization of System Resources. Indones. JElectr. Eng. Comput. sci. 11 (2), pp. 504–514. Cited by: Table 1.
  • [97] S. Landset, T. M. Khoshgoftaar, A. N. Richter, and T. Hasanin (2015) A survey of open source tools for machine learning with big data in the hadoop ecosystem. J Big Data 2 (1), pp. 24. Cited by: Table 1.
  • [98] G. J. Lasinio, G. Mastrantonio, and A. Pollice (2013) Discussing the “big n problem”. Stat Methods Appt 22 (1), pp. 97–112. Cited by: §5.1.
  • [99] N. A. Lazar (2003) Bayesian Empirical Likelihood. Biom. 90 (2), pp. 319–326. Cited by: §5.2.
  • [100] A. Lee and N. Whiteley (2016) Forest resampling for distributed sequential Monte Carlo. Stat. Anal. Data Min. 9 (4), pp. 230–248. Cited by: Table 1, §5.1.
  • [101] A. Lee, C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes (2010) On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J Comput. Graph. Stat. 19 (4), pp. 769–789. Cited by: §5.1.
  • [102] X. J. Lee, M. Hainy, J. P. McKeone, C. C. Drovandi, and A. N. Pettitt (2018) ABC model selection for spatial extremes models applied to South Australian maximum temperature data. Comput. Stat. Data Anal. 128, pp. 128–144. Cited by: Table 1, §5.2.
  • [103] S. Li, S. Dragicevic, F. A. Castro, M. Sester, S. Winter, A. Coltekin, C. Pettit, B. Jiang, J. Haworth, A. Stein, et al. (2016) Geospatial big data handling theory and methods: a review and research challenges. ISPRS J Photogramm Remote Sens 115, pp. 119–133. Cited by: §1.
  • [104] D. Lin (2013) Online learning of nonparametric mixture models via sequential variational approximation. In Advances in Neural Information Processing Systems, pp. 395–403. Cited by: §5.1, §5.2.
  • [105] F. Lindsten, A. M. Johansen, C. A. Naesseth, B. Kirkpatrick, T. B. Schön, J. Aston, and A. Bouchard-Côté (2017) Divide-and-conquer with sequential Monte Carlo. J Comput. Graph. Stat. 26 (2), pp. 445–458. Cited by: Table 1, §5.1.
  • [106] A. R. Linero (2018) Bayesian regression trees for high-dimensional prediction and variable selection. J Am. Stat. Assoc., pp. 1–11. Cited by: §5.1.
  • [107] B. Liquet, K. Mengersen, A. Pettitt, M. Sutton, et al. (2017) Bayesian variable selection regression of multivariate responses for group data. Bayesian Anal. 12 (4), pp. 1039–1067. Cited by: §5.2.
  • [108] B. Liu, E. Blasch, Y. Chen, D. Shen, and G. Chen (2013) Scalable sentiment classification for big data analysis using Naive Bayes classifier. In 2013 IEEE International Conference on Big Data, pp. 99–104. Cited by: Table 1, §5.2.
  • [109] L. Liu (2013) Computing infrastructure for big data processing. Front. Comput. Sci. 7 (2), pp. 165—170. Cited by: Table 1.
  • [110] Q. Liu and D. Wang (2016) Stein variational gradient descent: a general purpose Bayesian inference algorithm. In Advances In Neural Information Processing Systems, pp. 2378–2386. Cited by: Table 1, §5.1.
  • [111] Y. Liu, V. Ročková, and Y. Wang (2018) ABC Variable Selection with Bayesian Forests. arXiv preprint arXiv:1806.02304. Cited by: §5.2.
  • [112] Z. Liu, F. Sun, and D. P. McGovern (2017) Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data. BioData Min. 10 (1). Cited by: Table 1, Table 3.
  • [113] C. Loebbecke and A. Picot (2015) Reflections on societal and business model transformation arising from digitization and big data analytics: a research agenda. J Strategic Inf. Syst. 24 (3), pp. 149–157. Cited by: Table 1.
  • [114] J. Luo, M. Wu, D. Gopukumar, and Y. Zhao (2016) Big data application in biomedical research and health care: a literature review. Biomed Inform Insights 8, pp. BII–S31559. Cited by: §1, 2nd item.
  • [115] Z. Ma, P. K. Rana, J. Taghia, M. Flierl, and A. Leijon (2014) Bayesian estimation of dirichlet mixture model with variational inference. Pattern recognit. 47 (9), pp. 3143–3157. Cited by: Table 1, §5.2.
  • [116] D. Maclaurin and R. P. Adams (2014) Firefly Monte Carlo: Exact MCMC with Subsets of Data. In Artificial Intelligence, Twenty-Fourth International Joint Conference on, pp. 543–552. Cited by: §5.1.
  • [117] T. Magdon-Ismail, C. Narasimhadevara, D. Jaffe, and R. Nambiar (2017) TPCx-hs v2: transforming with technology changes. In Technology Conference on Performance Evaluation and Benchmarking, pp. 120–130. Cited by: Table 1.
  • [118] L. Mählmann, M. Reumann, N. Evangelatos, and A. Brand (2017) Big data for public health policy-making: policy empowerment. Public Health genom. 20 (6), pp. 312–320. Cited by: Table 1.
  • [119] F. Maire, N. Friel, and P. Alquier (2017) Informed sub-sampling MCMC: approximate bayesian inference for large datasets. Stat. Comput., pp. 1–34. Cited by: §5.1.
  • [120] R. Manibharathi and R. Dinesh (2018) Survey of challenges in encrypted data storage in cloud computing and big data. JNetw. Commun. Emerg. Technol. 8 (2). Cited by: Table 1.
  • [121] R. F. Mansour (2016) Understanding how big data leads to social networking vulnerability. Comput. in Hum. Behav. 57, pp. 348–351. Cited by: Table 1.
  • [122] A. Marshall, S. Mueck, and R. Shockley (2015) How leading organizations use big data and analytics to innovate. Strategy Leadersh. 43 (5), pp. 32–39. Cited by: 3rd item, Table 1.
  • [123] T. H. McCormick, R. Ferrell, A. F. Karr, and P. B. Ryan (2014) Big data, big results: knowledge discovery in output from large-scale analytics. Stat. Analysis Data Min. 7 (5), pp. 404–412. Cited by: §4.
  • [124] C. A. McGrory and D. Titterington (2007) Variational approximations in bayesian model selection for finite mixture distributions. Comput. Stat.Data Anal. 51 (11), pp. 5352–5367. Cited by: §5.2.
  • [125] T. J. McKinley, I. Vernon, I. Andrianakis, N. McCreesh, J. E. Oakley, R. N. Nsubuga, M. Goldstein, R. G. White, et al. (2018) Approximate Bayesian computation and simulation-based inference for complex stochastic epidemic models. Stat. Sci. 33 (1), pp. 4–18. Cited by: §5.1.
  • [126] E. Meeds and M. Welling (2014) GPS-ABC: Gaussian process surrogate approximate Bayesian computation. arXiv preprint arXiv:1401.2838. Cited by: §5.1.
  • [127] K. L. Mengersen, P. Pudlo, and C. P. Robert (2013) Bayesian computation via empirical likelihood. Proc. National Acad. Sciences 110 (4), pp. 1321–1326. Cited by: §5.1.
  • [128] S. Minsker, S. Srivastava, L. Lin, and D. B. Dunson (2017) Robust and scalable bayes via a median of subset posterior measures. J Mach. Learn. Res. 18 (1), pp. 4488–4527. Cited by: Table 1, §5.1.
  • [129] M. T. Moores, C. C. Drovandi, K. Mengersen, and C. P. Robert (2015) Pre-processing for approximate bayesian computation in image analysis. Stat. Comput. 25 (1), pp. 23–33. Cited by: Table 1, §5.2.
  • [130] N. Moustafa, G. Creech, E. Sitnikova, and M. Keshk (2017) Collaborative anomaly detection framework for handling big data of cloud computing. In Military Communications and Information Systems Conference (MilCIS), 2017, pp. 1–6. Cited by: 2nd item, Table 1.
  • [131] O. Müller, I. Junglas, J. v. Brocke, and S. Debortoli (2016) Utilizing big data analytics for information systems research: challenges, promises and guidelines. Eur J Inf Syst 25 (4), pp. 289–302. Cited by: Table 1.
  • [132] P. Müller, F. A. Quintana, A. Jara, and T. Hanson (2015) Bayesian nonparametric data analysis. Springer. Cited by: §5.2.
  • [133] C. A. Naesseth, S. W. Linderman, R. Ranganath, and D. M. Blei (2017) Variational Sequential Monte Carlo. arXiv preprint arXiv:1705.11140. Cited by: §5.1.
  • [134] W. Neiswanger, C. Wang, and E. Xing (2013) Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780. Cited by: §5.1.
  • [135] Y. Ni, P. Müller, M. Diesendruck, S. Williamson, Y. Zhu, and Y. Ji (2019) Scalable bayesian nonparametric clustering and classification. Journal of Computational and Graphical Statistics, pp. 1–13. Cited by: §5.2.
  • [136] L. G. Nongxa (2017) Mathematical and statistical foundations and challenges of (big) data sciences. S. Afr. J Sci. 113 (3-4), pp. 1–4. Cited by: Table 1, Table 3.
  • [137] A. O’Driscoll, J. Daugelaite, and R. D. Sleator (2013) ‘Big data’, hadoop and cloud computing in genomics. J Biomed. Inform. 46 (5), pp. 774–781. Cited by: Table 1.
  • [138] B. Oancea, R. M. Dragoescu, et al. (2014) Integrating r and hadoop for big data analysis. Romanian Stat. Rev. 62 (2), pp. 83–94. Cited by: Table 1.
  • [139] Z. Obermeyer and E. J. Emanuel (2016) Predicting the future—big data, machine learning, and clinical medicine. N. Engl. JMedicine 375 (13), pp. 1216. Cited by: 3rd item, Table 1.
  • [140] D. Oprea (2016) Big questions on big data. Revista de Cercetare si Interv. Soc. 55, pp. 112. Cited by: 2nd item, Table 1.
  • [141] A. B. Owen (2001) Empirical Likelihood. Chapman and Hall/CRC. Cited by: §5.1.
  • [142] S. Pandey and V. Tokekar (2014) Prominence of mapreduce in big data processing. In Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, pp. 555–560. Cited by: Table 1.
  • [143] A. Ç. Pehlivanlı (2015) A novel feature selection scheme for high-dimensional data sets: four-staged feature selection. J Appl. Stat. 43 (6), pp. 1140–1154. Cited by: Table 1, Table 3.
  • [144] D. N. Politis, J. P. Romano, and M. Wolf (1999) Subsampling. Springer Science & Business Media. Cited by: §4.
  • [145] A. T. Porter, S. H. Holan, and C. K. Wikle (2015) Bayesian semiparametric hierarchical empirical likelihood spatial models. J Stat. Plan. Inference 165, pp. 78–90. Cited by: §5.2.
  • [146] A. T. Porter, S. H. Holan, and C. K. Wikle (2015) Multivariate spatial hierarchical Bayesian empirical likelihood methods for small area estimation. Stat. 4 (1), pp. 108–116. Cited by: §5.2.
  • [147] P. Pudlo, J. Marin, A. Estoup, J. Cornuet, M. Gautier, and C. P. Robert (2015) Reliable ABC model choice via random forests. Bioinformatics 32 (6), pp. 859–866. Cited by: §5.1.
  • [148] J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016 (1), pp. 67. Cited by: 1st item.
  • [149] M. Quiroz, R. Kohn, M. Villani, and M. Tran (2018-03) Speeding up MCMC by efficient data subsampling. J Am. Stat. Assoc., pp. 1–13. Cited by: Table 1, §5.1.
  • [150] M. Quiroz, M. Villani, and R. Kohn (2015) Scalable MCMC for large data problems using data subsampling and the difference estimator. SSRN Electronic Journal. Cited by: §5.1.
  • [151] M. Rabinovich, E. Angelino, and M. I. Jordan (2015) Variational consensus Monte Carlo. In Advances in Neural Information Processing Systems, pp. 1207–1215. Cited by: §5.1.
  • [152] W. Raghupathi and V. Raghupathi (2014) Big data analytics in healthcare: promise and potential. Health Inf. Sci. and Syst. 2 (1), pp. 3. Cited by: 2nd item, Table 1.
  • [153] E. Raguseo (2018) Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int. Journal of Inf. Manag. 38 (1), pp. 187–195. Cited by: 3rd item, Table 1.
  • [154] C. E. Rasmussen (2000)

    The infinite gaussian mixture model

    .
    In Advances in Neural Information Processing Systems, pp. 554–560. Cited by: §5.2.
  • [155] C. E. Rasmussen (2004) Gaussian processes in machine learning. In Advanced lectures on machine learning, pp. 63–71. Cited by: §5.2.
  • [156] V. Rocková and S. van der Pas (2017) Posterior concentration for Bayesian regression trees and forests. Ann Stat(In Revision), pp. 1–40. Cited by: §5.1.
  • [157] J. Roski, G. W. Bo-Linn, and T. A. Andrews (2014) Creating value in health care through big data: opportunities and policy implications. Health Affairs 33 (7), pp. 1115–1122. Cited by: 2nd item, Table 1.
  • [158] J. S. Rumsfeld, K. E. Joynt, and T. M. Maddox (2016) Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13 (6), pp. 350–359. Cited by: 2nd item, Table 1.
  • [159] S. Sagiroglu and D. Sinanc (2013) Big data: a review. In Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp. 42–47. Cited by: 1st item, 2nd item, Table 1.
  • [160] S. M. Schennach (2005) Bayesian exponentially tilted empirical likelihood. Biometrika 92 (1), pp. 31–46. Cited by: §5.2.
  • [161] E. D. Schifano, J. Wu, C. Wang, J. Yan, and M. Chen (2016) Online updating of statistical inference in the big data setting. Technometrics 58 (3), pp. 393–403. Cited by: Table 1, 2nd item, Table 3, §4.
  • [162] S. L. Scott, A. W. Blocker, F. V. Bonassi, H. A. Chipman, E. I. George, and R. E. McCulloch (2016) Bayes and big data: the consensus Monte Carlo algorithm. Int. J Manag. Sci. Eng. Manag. 11 (2), pp. 78–88. Cited by: Table 1, §5.1.
  • [163] D. V. Shah, J. N. Cappella, and W. R. Neuman (2015) Big data, digital media, and computational social science: possibilities and perils. Ann Am Acad Pol Soc Sci 659 (1), pp. 6–13. Cited by: 1st item, Table 1.
  • [164] A. Siddiqa, A. Karim, and A. Gani (2017) Big data storage technologies: a survey. Front. of Inf. Technol. & Electronic Eng. 18 (8), pp. 1040–1070. Cited by: 1st item, Table 1.
  • [165] P. Singh and A. Hellander (2018) Multi-statistic Approximate Bayesian Computation with multi-armed bandits. arXiv preprint arXiv:1805.08647. Cited by: §5.1.
  • [166] S. Sisson, Y. Fan, and M. Beaumont (2018) Overview of abc. Handbook of Approximate Bayesian Computation, pp. 3–54. Cited by: §5.1.
  • [167] U. Sivarajah, M. M. Kamal, Z. Irani, and V. Weerakkody (2017) Critical analysis of big data challenges and analytical methods. J Bus. Res. 70, pp. 263–286. Cited by: Table 1.
  • [168] S. Srivastava, C. Li, and D. B. Dunson (2018) Scalable Bayes via barycenter in Wasserstein space. J Mach. Learn. Res. 19 (1), pp. 312–346. Cited by: §4, §5.1.
  • [169] H. Strathmann, D. Sejdinovic, and M. Girolami (2015) Unbiased Bayes for big data: paths of partial posteriors. arXiv preprint arXiv:1501.03326. Cited by: Table 1, §5.1.
  • [170] M. A. Suchard, Q. Wang, C. Chan, J. Frelinger, A. Cron, and M. West (2010) Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J Comput. Graphical Stat. 19 (2), pp. 419–438. Cited by: §5.1.
  • [171] Z. Sun, L. Sun, and K. Strang (2018) Big data analytics services for enhancing business intelligence. J Comput. Inf. Syst. 58 (2), pp. 162–169. Cited by: Table 1.
  • [172] S. Suthaharan (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform. Eval. Rev. 41 (4), pp. 70–73. Cited by: Table 1.
  • [173] O. Sysoev, A. Grimvall, and O. Burdakov (2014) Bootstrap confidence intervals for large-scale multivariate monotonic regression problems. Commun. Stat. - Simul. Comput. 45 (3), pp. 1025–1040. Cited by: Table 1, Table 3.
  • [174] D. Talia (2013) Clouds for scalable big data analytics. Comput. 46 (5), pp. 98–101. Cited by: 2nd item, Table 1.
  • [175] Y. Tang, Z. Xu, and Y. Zhuang (2016) Bayesian network structure learning from big data: a reservoir sampling based ensemble method. In International Conference on Database Systems for Advanced Applications, pp. 209–222. Cited by: §5.2.
  • [176] A. Tank, N. Foti, and E. Fox (2015) Streaming variational inference for Bayesian nonparametric mixture models. In Artificial Intelligence and Statistics, pp. 968–976. Cited by: §5.2.
  • [177] Y. W. Teh, A. H. Thiery, and S. J. Vollmer (2016) Consistency and fluctuations for stochastic gradient langevin dynamics. J Mach. Learn. Res. 17 (1), pp. 193–225. Cited by: §5.1.
  • [178] D. Tran, R. Ranganath, and D. M. Blei (2015) The variational Gaussian process. arXiv preprint arXiv:1511.06499. Cited by: §5.2.
  • [179] N. Tripuraneni, S. Gu, H. Ge, and Z. Ghahramani (2015) Particle gibbs for infinite hidden markov models. In Advances in Neural Information Processing Systems, pp. 2395–2403. Cited by: Table 1, §5.2.
  • [180] S. van der Pas and V. Rockova (2017) Bayesian dyadic trees and histograms for regression. In Advances in Neural Information Processing Systems, pp. 2089–2099. Cited by: §5.1.
  • [181] M. Viceconti, P. Hunter, and R. Hose (2015) Big data, big knowledge: big data for personalized healthcare. IEEE J Biomed Health Inform 19 (4), pp. 1209–1215. Cited by: 2nd item, Table 1.
  • [182] A. Vyas and S. Ram (2017) Comparative Study of MapReduce Frameworks in Big Data Analytics. Int. J Mod. Comput. Sci. 5 (Special Issue), pp. 5–13. Cited by: Table 1.
  • [183] S. F. Wamba, S. Akter, A. Edwards, G. Chopin, and D. Gnanzou (2015) How ‘big data’ can make big impact: findings from a systematic review and a longitudinal case study. Int. J Prod. Econ. 165, pp. 234–246. Cited by: §1, 3rd item, Table 1.
  • [184] Wang, M. Chen, E. Schifano, J. Wu, and J. Yan (2016) Statistical methods and computing for big data. Stat Interface 9 (4), pp. 399–414. Cited by: Table 1, Table 3, §4.
  • [185] C. Wang, J. Paisley, and D. Blei (2011) Online variational inference for the hierarchical dirichlet process. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 752–760. Cited by: §5.2.
  • [186] C. Wang, M. Chen, J. Wu, J. Yan, Y. Zhang, and E. Schifano (2017) Online updating method with new variables for big data streams. Can. Journal Stat. 46 (1), pp. 123–146. Cited by: Table 1, Table 3.
  • [187] J. Wang, Y. Tang, M. Nguyen, and I. Altintas (2014) A scalable data science workflow approach for big data Bayesian network learning. In 2014 IEEE/ACM Int Symp. Big Data Comput., pp. 16–25. Cited by: §5.2.
  • [188] L. Wang and D. B. Dunson (2011) Fast Bayesian inference in Dirichlet process mixture models. J Comput. Graphical Stat. 20 (1), pp. 196–216. Cited by: §5.2.
  • [189] T. Wang and R. J. Samworth (2017) High dimensional change point estimation via sparse projection. J Royal Stat. Soc.: Ser. B (Stat. Methodol.) 80 (1), pp. 57–83. Cited by: Table 1, Table 3.
  • [190] X. Wang and D. B. Dunson (2013) Parallelizing MCMC via weierstrass sampler. arXiv preprint arXiv:1312.4605. Cited by: §5.1.
  • [191] Wang and Y. Xu (2015) Fast clustering using adaptive density peak detection. Stat. Methods Med. Res. 26 (6), pp. 2800–2811. Cited by: Table 3.
  • [192] H. J. Watson (2014) Tutorial: big data analytics: concepts, technologies, and applications.. CAIS 34, pp. 65. Cited by: Table 1.
  • [193] Y. Webb-Vargas, S. Chen, A. Fisher, A. Mejia, Y. Xu, C. Crainiceanu, B. Caffo, and M. A. Lindquist (2017) Big data and neuroimaging. Stat. Biosci. 9 (2), pp. 543–558. Cited by: Table 1.
  • [194] S. White, T. Kypraios, and S. P. Preston (2015) Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution. Stat. and Comput. 25 (2), pp. 289–301. Cited by: §5.1.
  • [195] R. Wilkinson (2014) Accelerating ABC methods using Gaussian processes. In Artificial Intelligence and Statistics, pp. 1015–1023. Cited by: §5.1.
  • [196] S. Williamson, A. Dubey, and E. P. Xing (2013) Parallel Markov chain Monte Carlo for nonparametric mixture models. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 98–106. Cited by: §5.1.
  • [197] A. F. Wise and D. W. Shaffer (2015) Why theory matters more than ever in the age of big data. J Learn. Anal. 2 (2), pp. 5–13. Cited by: Table 1, §4.
  • [198] C. Wu and C. P. Robert (2017) Average of recentered parallel MCMC for big data. arXiv preprint arXiv:1706.04780. Cited by: Table 1, §5.1, §5.1.
  • [199] X. Xia (2017) Small data, mid data, and big data versus algebra, analysis, and topology. IEEE Signal Process. Mag. 34 (1), pp. 48–51. Cited by: Table 1.
  • [200] C. Yang, Q. Huang, Z. Li, K. Liu, and F. Hu (2017) Big data and cloud computing: innovation opportunities and challenges. Int. Journal of Digit. Earth 10 (1), pp. 13–53. Cited by: 2nd item, Table 1.
  • [201] C. Yoo, L. Ramirez, and J. Liuzzi (2014) Big data analysis using modern statistical and machine learning methods in medicine. Int. Neurourol. J. 18 (2), pp. 50. Cited by: Table 1.
  • [202] L. Yu and N. Lin (2017) ADMM for penalized quantile regression in big data. Int. Stat. Rev. 85 (3), pp. 494–518. Cited by: Table 3.
  • [203] L. Zhang, A. Datta, and S. Banerjee (2019) Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments. Stat Anal Data Min 12 (3), pp. 197–209. Cited by: §5.2.
  • [204] T. Zhang and B. Yang (2017) An exact approach to ridge regression for big data. Comput.Stat., pp. 1–20. Cited by: Table 1, Table 3.
  • [205] X. Zhang, C. Liu, S. Nepal, C. Yang, W. Dou, and J. Chen (2014) A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud. J Comput. Syst. Sci. 80 (5), pp. 1008–1020. Cited by: 2nd item, Table 1.
  • [206] Y. Zhang, T. Cao, S. Li, X. Tian, L. Yuan, H. Jia, and A. V. Vasilakos (2016) Parallel processing systems for big data: a survey. Proceedings of the IEEE 104 (11), pp. 2114–2136. Cited by: 2nd item, Table 1.
  • [207] Z. Zhang, K. R. Choo, and B. B. Gupta (2018) The convergence of new computing paradigms and big data analytics methodologies for online social networks. J Comput. Sci. 26, pp. 453–455. Cited by: Table 1.
  • [208] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237, pp. 350–361. Cited by: §5.2.
  • [209] J. Zhu, J. Chen, W. Hu, and B. Zhang (2017) Big learning with Bayesian methods. Natl Sci. Rev. 4 (4), pp. 627–651. Cited by: Table 1, §5.2.
  • [210] G. Zoubin (2013) Scaling the Indian Buffet process via submodular maximization. In International Conference on Machine Learning, pp. 1013–1021. Cited by: Table 1, §5.2.

Acknowledgement

This research was supported by an ARC Australian Laureate Fellowship for project, Bayesian Learning for Decision Making in the Big Data Era under Grant no. FL150100150. The authors also acknowledge the support of the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).

Appendix

List of acronyms used for Bayesian Computational Algorithms and Models are:

ABC Approximate Bayesian Computation
BEL Bayesian Empirical Likelihood
BN Bayesian Network
BNN

Bayesian Neural Network

BART Bayesian Additive Regression Trees
CART Classification and Regression Trees
DP Dirichlet Process
GP Gaussian Process
HMC Hamiltonian Monte Carlo
HMM Hidden Markov Models
IBP Indian Buffet Process
iHMM Infinite Hidden Markov Models
MCMC Markov Chain Monte Carlo
MH Metropolis Hasting
NBC Naive Bayes Classifier
NNGP Nearest Neighbour Gaussian Process
SMC Sequential Monte Carlo
VB Variational Bayes
VI Variational Inference