1 Introduction
Although there are many variations on the definition of Big Data [91, 51, 52, 183], it is clear that it encompasses large and often diverse quantitative data obtained from increasing numerous sources at different individual, spatial and temporal scales, and with different levels of quality. Examples of Big Data include data generated from social media [22]; data collected in biomedical and healthcare informatics research such as DNA sequences and electronic health records [114]; geospatial data generated by remote sensing, laser scanning, mobile mapping, geolocated sensors, geotagged web contents, volunteered geographic information (VGI), global navigation satellite system (GNSS) tracking and so on [103]. The volume and complexity of Big Data often exceeds the capability of the standard analytics tools (software, hardware, methods and algorithms) [92, 70]
. The concomitant challenges of managing, modelling, analysing and interpreting these data have motivated a large literature on potential solutions from a range of domains including statistics, machine learning and computer science. This literature can be grouped into four broad categories of articles. The first includes general articles about the concept of Big Data, including the features and challenges, and their application and importance in specific fields. The second includes literature concentrating on infrastructure and management, including parallel computing and specialised software. The third focuses on statistical and machine learning models and algorithms for Big Data. The final category includes articles on the application of these new techniques to complex realworld problems.
In this chapter, we classify the literature published on Big Data into finer classes than the four broad categories mentioned earlier and briefly reviewed the contents covered by those different categories. But the main focus of the chapter is around the third category, in particular on statistical contributions to Big Data. We examine the nature of these innovations and attempt to catalogue them as modelling, algorithmic or other contributions. We then drill further into this set and examine the more specific literature on Bayesian approaches. Although there is an increasing interest in this paradigm from a wide range of perspectives including statistics, machine learning, information science, computer science and the various application areas, to our knowledge there has not yet been a review of Bayesian statistical approaches for Big Data. This is the primary contribution of this chapter.
This chapter provides a review of the published studies that present Bayesian statistical models specifically for Big Data and discusses the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data.
The chapter proceeds as follows. In the next section, literature search and inclusion criteria for this chapter is outlined. A classification of Big Data literature along with brief review of relevant literature in each class is presented in section 3. Section 4 consists of a brief review of articles discussing Big Data problems from statistical perspectives, followed by a review of Bayesian approaches applied to Big Data. The final section includes a discussion of this review with a view to answering the research question posed above.
2 Literature Search and Inclusion Criteria
The literature search for this review paper was undertaken using different methods. The search methods implemented to find the relevant literature and the criteria for the inclusion of the literature in this chapter are briefly discussed in this section.
2.1 Inclusion Criteria
Acknowledging the fact that there has been a wide range of literature on Big Data, the specific focus in this chapter was on recent developments published in the last 5 years, 20132019.
For quality assurance reasons, of the literature only peer reviewed published articles, book chapters and conference proceedings were included in the chapter. Some articles were also included from arXiv and preprint versions for those to be soon published and from well known researchers working in that particular area of interest.
2.2 Search Methods
Database Search: The database “Scopus" was used to initiate the literature search. To identify the availability of literature and broadly learn about the broad areas of concentration, the following keywords were used: Big Data, Big Data Analysis, Big Data Analytics, Statistics and Big Data.
The huge range of literature obtained by this initial search was complemented by a search of “Google Scholar" using more specific key words as follows: Features and Challenges of Big Data, Big Data Infrastructure, Big Data and Machine Learning, Big Data and Cloud Computing, Statistical approaches/methods/models in Big Data, Bayesian Approaches/Methods/Models in Big Data, Big Data analysis using Bayesian Statistics, Bayesian Big Data, Bayesian Statistics and Big Data.
Expert Knowledge: In addition to the literature found by the above Database search, we used expert knowledge and opinions in the field and reviewed the works of well known researchers in the field of Bayesian Statistics for their research works related to Bayesian approaches to Big Data and included the relevant publications for review in this chapter.
Scanning References of selected literature: Further studies and literature were found by searching the references of selected literature.
Searching with specific keywords:
Since the focus of this chapter is to review the Bayesian approaches to Big Data, more literature was sourced by using specific Bayesian methods or approaches found to be applied to Big Data: Approximate Bayesian Computation and Big Data, Bayesian Networks in Big Data, Classification and regression trees/Bayesian Additive regression trees in Big Data, Naive Bayes Classifiers and Big Data, Sequential Monte Carlo and Big Data, Hamiltonian Monte Carlo and Big Data, Variational Bayes and Big Data, Bayesian Empirical Likelihood and Big Data, Bayesian Spatial modelling and Big Data, Non parametric Bayes and Big Data.
This last step was conducted in order to ensure that this chapter covers the important and emerging areas of Bayesian Statistics and their application to Big Data. These searches were conducted in “Google Scholar" and up to 30 pages of results were considered in order to find relevant literature.
3 Classification of Big Data literature
The published articles on Big Data can be divided into finer classes than the four main categories described above. Of course, there are many ways to make these delineations. Table 1 shows one such delineation, with representative references from the last five years of published literature. The aim of this table is to indicate the wide ranging literature on Big Data and provide relevant references in different categories for interested readers.
Topic  Representative References 
Features and Challenges  [167, 199, 140, 51, 52, 183, 70, 63, 65, 159]. 
Infrastructure  [96, 207, 117, 164, 182, 10, 206, 131, 138, 142, 192, 50, 109]. 
Cloud computing  [120, 130, 38, 200, 11, 113, 32, 205, 54, 137, 174]. 
Applications (3 examples)  Social science: [5, 39, 22, 121, 163, 37]. Health/medicine/medical science: [118, 46, 8, 16, 158, 9, 21, 28, 34, 181, 19, 152, 157, 201, 82]. Business: [153, 60, 171, 66, 31, 2, 36, 122]. 
Machine Learning Methods  [55, 26, 139, 3, 4, 97, 172, 27, 64, 89]. 
Statistical Methods  [61, 136, 193, 186, 189, 112, 204, 67, 161, 86, 184, 57, 143, 197, 44, 173, 45, 87, 84]. 
Bayesian Methods  [80, 149, 209, 198, 102, 128, 105, 100, 110, 162, 81, 179, 129, 115, 169, 210, 7, 108]. 
The links between these classes of literature can be visualised as in Figure 1 and a brief description of each of the classes and the contents covered by the relevant references listed are provided in Table 2. The brief reviews presented in Table 2 can be helpful for interested readers to develop a broad idea about each of the classes mentioned in Table 1. However, Table 2 does not include brief reviews of the last two classes, namely, Statistical Methods and Bayesian Methods, since these classes are discussed in detail in sections 4 and 5. We would like to acknowledge the fact that Bayesian methods are essentially part of statistical methods, but in this chapter, the distinct classes are made intentionally to be able to identify and discuss the specific developments in Bayesian approaches.
Features and Challenges 

Infrastructure 

Cloud computing 

Applications (3 examples) 

Machine Learning Methods 

4 Statistical Approaches to Big Data
The importance of modelling and theoretical considerations for analysing Big Data are well stated in the literature [86, 197]. These authors pointed out that blind trust in algorithms without proper theoretical considerations will not result in valid outputs. The emerging challenges of Big Data are beyond the issues of processing, storing and management. The choice of suitable statistical methods is crucial in order to make the most of the Big Data [67, 87]. [61] highlighted the role of statistical methods for interpretability, uncertainty quantification, reducing selection bias in analysing Big Data.
In this section we present a brief review of some of the published research on statistical perspectives, methods, models and algorithms that are targeted to Big Data. As above, the review is confined to the last five years, commencing with the most recent contributions. Bayesian approaches are reserved for the next section.
Topic: Discussion Article 
Author: Dunson (2018) [61]

Topic: Review 
Author: Nongxa (2017) [136]

Author: Franke et al.(2016) [67]

Author: Chen et al. (2015) [44]

Author: Hoerl et al. (2014) [87]

Topic: Review of methods & extension 
Author: Wang et al. (2016) [184]

Topic: Methods review, new methods 
Author: Genuer et al. (2017) [72]

Author: Wang and Xu (2015) [191]

Author: Wang et al. (2017) [186]

Topic: New methods and algorithms 
Author: Liu et al. (2017) [112]

Author: Schifano et al. (2016) [161]

Author: Allen et al. (2014) [6]

Topic: New algorithms 
Author: Wang and Samworth (2017) [189]

Author: Yu and Lee (2017) [202]

Author: Zhang and Yang (2017) [204]

Author: Doonrik and Hendry (2015) [57]

Author: Sysoev et al. (2014) [173]

Author: Pehlivanl(2015) [143]

Among the brief reviews of the relevant literature in Table 3, we include detailed reviews of three papers which are more generic in explaining the role of statistics and statistical methods in Big Data along with recent developments in this area.
[184] summarised the published literature on recent methodological developments for Big Data in three broad groups: subsampling, which calculates a statistic in many subsamples taken from the data and then combining the results [144]; divide and conquer, the principle of which is to break a dataset into smaller subsets to analyse these in parallel and combine the results at the end [168] ; and online updating of streaming data [161]
, based on online recursive analytical processing. He summarised the following methods in the first two groups: subsampling based methods (bag of little bootstraps, leveraging, mean log likelihood, subsample based MCMC), divide and conquer (aggregated estimating equations, majority voting, screening with ultra high dimension, parallel MCMC). The authors, after reviewing existing online updating methods and algorithms, extended the online updating of stream data method by including criterion based variable selection with online updating. The authors also discussed the available software packages (open source R as well as commercial software) developed to handle computational complexity involving Big Data. For breaking the memory barrier using R, the authors cited and discussed several data management packages (sqldf, DBI, RSQLite, filehash, bigmemory, ff) and packages for numerical calculation (speedglm, biglm, biganalytics, ffbase, bigtabulate, bigalgebra, bigpca, bigrf, biglars, PopGenome). The R packages for breaking computing power were cited and discussed in two groups: packages for speeding up (compiler, inline, Rcpp, RcpEigen, RcppArmadilo, RInside, microbenchmark, proftools, aprof, lineprof, GUIprofiler) and packages for scaling up (Rmpi, snow, snowFT, snowfall, multicore, parallel, foreach, Rdsm, bigmemory, pdpMPI, pbdSLAP, pbdBASE, pbdMAT, pbdDEMO, Rhipe, segue, rhbase, rhdfs, rmr, plymr, ravroSparkR, pnmath, pnmath0, rsprng, rlecuyer, doRNG, gputools, bigvis). The authors also discussed the developments in Hadoop, Spark, OpenMP, API and using FORTRAN and C++ from R in order to create flexible programs for handling Big Data. The article also presented a brief summary about the commercial statistical software, e.g., SAS, SPSS, MATLAB. The study included a case study of fitting a logistic model to a massive data set on airline ontime performance data from the 2009 ASA Data Expo mentioning the use of some R packages discussed earlier to handle the problem with memory and computational capacity. Overall, this study provided a comprehensive review and discussion of stateoftheart statistical methodologies and software development for handling Big Data.
[44] presented their views on the challenges and importance of Big Data and explained the role of statistics in Big Data Analytics based on a review of relevant literature. This study emphasised the importance of statistical knowledge and skills in Big Data Analytics using several examples. As detailed in Table 3, the authors broadly discussed a range of statistical methods which can be really helpful in better analysis of Big Data, such as, the use of exploratory data analysis principle in Statistics to investigate correlations among the variables in the data or establish causal relationships between response and explanatory variables in the Big Data. The authors specifically mentioned hypothesis testing, predictive analysis using statistical models, statistical inference using uncertainty estimation to be some key tools to use in Big Data analysis. The authors also explained that the combination of statistical knowledge can be combined with the Data mining methods such as unsupervised learning (cluster analysis, Association rule learning, anomaly detection) and supervised learning (regression and classification) can be beneficial for Big Data analysis. The challenges for the statisticians in coping with Big Data were also described in this article, with particular emphasis on computational skills in data acquisition (knowledge of programming languages, knowledge of web and core communication protocols), data processing (skills to transform voice or image data to numeric data using appropriate software or programming), data management (knowledge about database management tools and technologies, such as NoSQL) and scalable computation (knowledge about parallel computing, which can be implemented using MapReduce, SQL etc.).
As indicated above, many of the papers provide a summary of the published literature which is not replicated here. Some of these reviews are based on large thematic programs that have been held on this topic. For example, the paper by [67] is based on presentations and discussions held as part of the program on Statistical Inference, Learning and Models for Big Data which was held in Canada in 2015. The authors discussed the four V’s (volume, variety, veracity and velocity) of Big Data and mentioned some more challenges in Big Data analysis which are beyond the complexities associated with the four V’s. The additional “V" mentioned in this article is veracity. Veracity refers to biases and noise in the data which may be the result of the heterogeneous structure of the data sources, which may make the sample non representative of the population. Veracity in Big Data is often referred to as the biggest challenge compared with the other V’s. The paper reviewed the common strategies for Big Data analysis starting from data wrangling which consists of data manipulation techniques for making the data eligible for analysis; visualisation which is often an important tool to understand the underlying patterns in the data and is the first formal step in data analysis; reducing the dimension of data using different algorithms such as Principal Component Analysis (PCA) to make Big Data models tractable and interpretable; making models more robust by enforcing sparsity in the model by the use of regularisation techniques such as variable selection and model fitting criteria; using optimisation methods based on different distance measures proposed for high dimensional data and by using different learning algorithms such as representation learning and sequential learning. Different applications of Big Data were shown in public health, health policy, law and order, education, mobile application security, image recognition and labelling, digital humanities and materials science.
There are few other research articles focused on statistical methods tailored to specific problems, which are not included in Table 2. For example, [40] proposed a statisticsbased algorithm using a stochastic spacetime model with more than 1 billion data points to reproduce some features of a climate model. Similarly, [123] used various statistical methods to obtain associations between drugoutcome pairs in a very big longitudinal medical experimental database (with information on millions of patients) with a detailed discussion on the big results problem by providing a comparison of statistical and machine learning approaches. Finally, [84] proposed stochastic variational inference for Gaussian processes which makes the application of Gaussian process to huge data sets (having millions of data points).
From the review of some relevant literature related to statistical perspectives for analysing Big Data, it can be seen that along with scaling up existing algorithms, new methodological developments are also in progress in order to face the challenges associated with Big Data.
5 Bayesian Approaches in Big Data
As described in the Introduction, the intention of this review is to commence with a broad scope of the literature on Big Data, then focus on statistical methods for Big Data, and finally to focus in particular on Bayesian approaches for modelling and analysis of Big Data. This section consists of a review of published literature on the last of these.
There are two defining features of Bayesian analysis: (i) the construction of the model and associated parameters and expectations of interest, and (ii) the development of an algorithm to obtain posterior estimates of these quantities. In the context of Big Data, the resultant models can become complex and suffer from issues such as unavailability of a likelihood, hierarchical instability, parameter explosion and identifiability. Similarly, the algorithms can suffer from too much or too little data given the model structure, as well as problems of scalability and cost. These issues have motivated the development of new model structures, new methods that avoid the need for models, new Markov chain Monte Carlo (MCMC) sampling methods, and alternative algorithms and approximations that avoid these simulationbased approaches. We discuss some of the concomitant literature under two broad headings, namely computation and models realising that there is often overlap in cited papers.
5.1 Bayesian Computation
In Bayesian framework a mainstream computational tool has been the Markov chain Monte Carlo (MCMC). The traditional MCMC methods do not scale well because they need to iterate through the full data set at each iteration to evaluate the likelihood [198]. Recently several attempts have been made to scale MCMC methods up to massive data. A widely used strategy to overcome the computational cost is to distribute the computational burden across a number of machines. The strategy is generally referred to as divideandconquer sampling. This approach breaks a massive data set into a number of easier to handle subsets, obtains posterior samples based on each subset in parallel using multiple machines and finally combines the subset posterior inferences to obtain the fullposterior estimates [168]. The core challenge is the recombination of subposterior samples to obtain true posterior samples. A number of attempts have been made to address this challenge.
approximated the subposteriors using kernel density estimation and then aggregated the subposteriors by taking their product. Both algorithms provided consistent estimates of the posterior.
[134] provided faster MCMC processing since it allowed the machine to process the parallel MCMC chains independently. However, one limitation of the asymptotically embarrassing parallel MCMC algorithm [134] is that it only works for real and unconstrained posterior values, so there is still scope of works to make the algorithm work under more general settings.[190] adopted a similar approach of parallel MCMC but used a Weierstrass transform to approximate the subposterior densities instead of a kernel density estimate. This provided better approximation accuracy, chain mixing rate and potentially faster speed for large scale Bayesian analysis.
[162] partitioned the data at random and performed MCMC independently on each subset to draw samples from posterior given the data subset. To obtain consensus posteriors they proposed to average samples across subsets and showed the exactness of the algorithm under a Gaussian assumption. This algorithm is scalable to a very large number of machines and works in cluster, single multi core or multiprocessor computers or any arbitrary collection of computers linked by a high speed network. The key weakness of consesnsous MCMC is it does not apply to non Gaussian posterior.
[128] proposed dividing a large set of independent data into a number of nonoverlapping subsets, making inferences on the subsets in parallel and then combining the inferences using the median of the subset posteriors. The median posterior (Mposterior) is constructed from the subset posteriors using Weiszfeld’s algorithm, which provides a scalable algortihm for robust estimation .
[77] extended this notion to spatially dependent data, provided a scalable divide and conquer algorithm to analyse big spatial data sets named spatial meta kriging. The multivariate extension of spatial meta kriging has been addressed by [78]. These approaches of meta kriging are practical developments for Bayesian spatial inference for Big Data, specifically with “bigN" problems [98].
[198] proposed a new and flexible divide and conquer framework by using rescaled subposteriors to approximate the overall posterior. Unlike other parallel approaches of MCMC, this method creates artificial data for each subset, and applies the overall priors on the artificial data sets to get the subset posteriors. The subposteriors are then recentred to their common mean and then averaged to approximate the overall posterior. The authors claimed this method to have statistical justification as well as mathematical validity along with sharing same computational cost with other classical parallel MCMC approaches such as consensus Monte Carlo, Weierstrass sampler. [30]
proposed a nonreversible rejectionfree MCMC method, which reportedly outperforms stateoftheart methods such as: HMC, Firefly by having faster mixing rate and lower variances for the estimators for high dimensional models and large data sets. However, the automation of this method is still a challenge.
Another strategy for scalable Bayesian inference is the subsampling based approach. In this approach, a smaller subset of data is queried in the MCMC algorithm to evaluate the likelihood at every iteration.
[116] proposed to use an auxiliary variable MCMC algorithm that evaluates the likelihood based on a small subset of the data at each iteration yet simulates from the exact posterior distribution. To improve the mixing speed, [95] used an approximate Metropolis Hastings (MH) test based on a subset of data. A similar approach is used in [17], where the accept/reject step of MH evaluates the likelihood of a random subset of the data. [18] extended this approach by replacing a number of likelihood evaluations by a Taylor expansion centred at the maximum of the likelihood and concluded that their method outperforms the previous algorithms [95].The scalable MCMC approach was also improved by [150] using a difference estimator to estimate the log of the likelihood accurately using only a small fraction of the data. [149]
introduced an unbiased estimator of the log likelihood based on weighted subsample which is used in the MH acceptance step in speeding up based on a weighted MCMC efficiently. Another scalable adaptation of MH algorithm was proposed by
[119]to speed up Bayesian inference in Big Data namely informed subsampling MCMC which involves drawing of subsets according to a similarity measure (i.e., squared L2 distance between full data and maximum likelihood estimators of subsample) instead of using uniform distribution. The algorithm showed excellent performance in the case of a limited computational budget by approximating the posterior for a tall dataset.
Another variation of MCMC in Big Data has been made by [169]. These authors approximated the posterior expectation by a novel Bayesian inference framework for approximating the posterior expectation from a different perspective suitable for Big Data problems, which involves paths of partial posteriors. This is a parallelisable method which can easily be implemented using existing MCMC techniques. It does not require the simulation from full posterior, thus bypassing the complex convergence issues of kernel approximation. However, there is still scope for future work to look at computationvariance trade off and finite time bias produced by MCMC.
Hamiltonian Monte Carlo (HMC) sampling methods provide powerful and efficient algorithms for MCMC using high acceptance probabilities for distant proposals [45]. A conceptual introduction to HMC is presented by [25]. [45] proposed a stochastic gradient HMC using secondorder Langevin dynamics. Stochastic Gradient Langevin Dynamics (SGLD) have been proposed as a useful method for applying MCMC to Big Data where the acceptreject step is skipped and decreasing step size sequences are used [1]. For more detailed and rigorous mathematical framework, algorithms and recommendations, interested readers are referred to [177].
A popular method of scaling Bayesian inference, particularly in the case of analytically intractable distributions, is Sequential Monte Carlo (SMC) or particle filters [48, 24, 80]. SMC algorithms have recently become popular as a method to approximate integrals. The reasons behind their popularity include their easy implementation and parallelisation ability, much needed characteristics in Big Data implementations [100]
. SMC can approximate a sequence of probability distributions on a sequence of spaces with an increasing dimension by applying resampling, propagation and weighting starting with the prior and eventually reaching to the posterior of interest of the cloud of particles.
[80] proposed a subsampling SMC which is suitable for parallel computation in Big Data analysis, comprising two steps. First, the speed of the SMC is increased by using an unbiased and efficient estimator of the likelihood, followed by a Metropolis within Gibbs kernel. The kernel is updated by a HMC method for model parameters and a blockpseudo marginal proposal for the auxiliary variables [80]. Some novel approaches of SMC include: divideandconquer SMC [105], multilevel SMC [24], online SMC [75] and one pass SMC [104], among others.Stochastic variational inference (VI, also called Variational Bayes, VB) is a faster alternative to MCMC [88]. It approximates probability densities using a deterministic optimisation method [110] and has seen widespread use to approximate posterior densities for Bayesian models in largescale problems. The interested reader is referred to [29] for a detailed introduction to variational inference designed for statisticians, with applications. VI has been implemented in scaling up algorithms for Big Data. For example, a novel reparameterisation of VI has been implemented for scaling latent variable models and sparse GP regression to Big Data [69].
There have been studies which combined the VI and SMC in order to take advantage from both strategies in finding the true posterior [56, 133, 151]. [133] employed a SMC approach to get an improved variational approximation, [151] by splitting the data into block, applied SMC to compute partial posterior for each block and used a variational argument to get a proxy for the true posterior by the product of the partial posteriors. The combination of these two techniques in a Big Data context was made by [56]. [56] proposed a new sampling scheme called Shortened Bridge Sampler, which combines the strength of deterministic approximations of the posterior that is variational Bayes with those of SMC. This sampler resulted in reduced computational time for Big Data with huge numbers of parameters, such as data from genomics or network.
[79] proposed a novel algorithm for Bayesian inference in the context of massive online streaming data, extending the Gibbs sampling mechanism for drawing samples from conditional distributions conditioned on sequential point estimates of other parameters. The authors compared the performance of this conditional density filtering algorithm in approximating the true posterior with SMC and VB, and reported good performance and strong convergence of the proposed algorithm.
Approximate Bayesian computation (ABC) is gaining popularity for statistical inference with high dimensional data and computationally intensive models where the likelihood is intractable [125]. A detailed overview of ABC can be found in [166] and asymptotic properties of ABC are explored in [68]. ABC is a likelihood free method that approximates the posterior distribution utilising imperfect matching of summary statistics [166]. Improvements on existing ABC methods for efficient estimation of posterior density with Big Data (complex and high dimensional data with costly simulations) have been proposed by [90]. The choice of summary statistics from high dimensional data is a topic of active discussion; see, for example, [90, 165]. [147] provided a reliable and robust method of model selection in ABC employing random forests which was shown to have a gain in computational efficiency.
There is another aspect of ABC recently in terms of approximating the likelihood using Bayesian Synthetic likelihood or empirical likelihood [59]. Bayesian synthetic likelihood arguably provides computationally efficient approximations of the likelihood with high dimensional summary statistics [126, 195]
. Empirical likelihood, on the other hand is a nonparametric technique of approximating the likelihood empirically from the data considering the moment constraints; this has been suggested in the context of ABC
[127], but has not been widely adopted. For further reading on empirical likelihood, see [141].Classification and regression trees are also very useful tools in data mining and Big Data analysis [33]. There are Bayesian versions of regression trees such as Bayesian Additive Regression Trees (BART) [47, 93, 7]. The BART algorithm has also been applied to the Big Data context and sparse variable selection by [156, 180, 106].
5.2 Bayesian Modelling
The extensive development of Bayesian computational solutions has opened the door to further developments in Bayesian modelling. Many of these new methods are set in the context of application areas. For example, there have been applications of ABC for Big Data in many different fields [62, 102]. For example, [62] developed a high performance computing ABC approach for estimation of parameters in platelets deposition, while [102] proposed ABC methods for inference in high dimensional multivariate spatial data from a large number of locations with a particular focus on model selection for application to spatial extremes analysis. Bayesian mixtures are a popular modelling tool. VB and ABC techniques have been used for fitting Bayesian mixture models to Big Data [124, 176, 88, 29, 129].
Variable selection in Big Data (wide in particular, having massive number of variables) is a demanding problem. [107] proposed multivariate extensions of the Bayesian group lasso for variable selection in high dimensional data using Bayesian hierarchical models utilising spike and slab priors with application to gene expression data. The variable selection problem can also be solved employing ABC type algorithms. [111] proposed a sampling technique, ABC Bayesian forests, based on splitting the data, useful for high dimensional wide data, which turns out to be a robust method in identifying variables with larger marginal inclusion probability.
Bayesian nonparametrics [132]
have unbounded capacity to adjust unseen data through activating additional parameters that were inactive before the emergence of new data. In other words, the new data are allowed to speak for themselves in nonparametric models rather than imposing an arguably restricted model (that was learned on an available data) to accommodate new data. The inherent flexibility of these models to adjust with new data by adapting in complexity makes them more suitable for Big Data as compared to their parametric counterparts. For a brief introduction to Bayesian nonparametric models and a nontechnical overview of some of the main tools in the area, the interested reader is referred to
[74].The popular tools in Bayesian nonparametrics include Gaussian processes (GP) [155], Dirichlet processes (DP) [154], Indian buffet process (IBP) [73]
and infinite hidden Markov models (iHMM)
[20]. GP have been used for a variety of applications [41, 49, 35] and attempts have been made to scale it to Big Data [84, 85, 178, 53]. DP have seen successes in clustering and faster computational algorithms are being adopted to scale them to Big Data [185, 188, 104, 115, 71]. IBP are used for latent feature modeling, where the number of features are determined in a datadriven fashion and have been scaled to Big Data through variational inference algorithms [210]. Being an alternative to classical HMM, one of the distinctive properties of iHMM is that it infers the number of hidden states in the system from the available data and has been scaled to Big Data using particle filtering algorithms [179].Gaussian Processes are also employed in the analysis of high dimensional spatially dependent data [15]. [15] provided modelbased solutions employing low rank GP and nearest neighbour GP (NNGP) as scalable priors in a hierarchical framework to render full Bayesian inference for big spatial or spatio temporal data sets. [203] extended the applicability of NNGP for inference of latent spatially dependent processes by developing a conjugate latent NNGP model as a practical alternative to onerous Bayesian computations. Use of variational optimisation with structured Bayesian GP latent variable model to analyse spatially dependent data is made in [12]. For a review of methods of analysis of massive spatially dependent data including the Bayesian approaches, see [83].
Another Bayesian modelling approach that has been used for big and complex data is Bayesian Networks (BN). This methodology has generated a substantial literature examining theoretical, methodological and computational approaches, as well as applications [175]. BN belong to the family of probabilistic graphical models and based on direct acyclic graphs which are very useful representation of causal relationship among variables [23]. BN are used as efficient learning tool in Big Data analysis integrated with scalable algorithms [187, 208]. For a more detailed understanding of BN learning from Big Data, please see [175].
Classification is also an important tool for extracting information from Big Data and Bayesian classifiers, including Naive Bayes classifier (NBC) are used in Big Data classification problems [94, 108]. Parallel implementation of NBC has been proposed by [94]. Moreover, [108] evaluated the scalability of NBC in Big Data with application to sentiment classification of millions of movie review and found NBC to have improved accuracy in Big Data. [135] proposed a scalable multi step clustering and classification algorithm using Bayesian nonparametrics for Big Data with large n and small p which can also run in parallel.
The past fifteen years has also seen an increase in interest in Empirical Likelihood (EL) for Bayesian modelling. The idea of replacing the likelihood with an empirical analogue in a Bayesian framework was first explored in detail by [99]. The author demonstrated that this Bayesian Empirical Likelihood (BEL) approach increases the flexibility of EL approach by examining the length and coverage of BEL intervals. The paper tested the methods using simulated data sets. Later, [160] provided probabilistic interpretations of BEL exploring moment condition models with EL and provided a non parametric version of BEL, namely Bayesian Exponentially Tilted Empirical Likelihood (BETEL). The BEL methods have been applied in spatial data analysis in [43] and [145, 146] for small area estimation.
We acknowledge that there are many more studies on the application of Bayesian approaches in different fields of interest which are not included in this review. There are also other review papers on overlapping and closely related topics. For example, [209] describes Bayesian methods of machine learning and includes some of the Bayesian inference techniques reviewed in the present study. However, the scope and focus of this review is different from that of [209], which was focused around the methods applicable to machine learning.
6 Conclusions
We are living in the era of Big Data and continuous research is in progress to make most use of the available information. The current chapter has attempted to review the recent developments made in Bayesian statistical approaches for handling Big Data along with a general overview and classification of the Big Data literature with brief review in last 5 years. This review chapter provides relevant references in Big Data categorised in finer classes, a brief description of statistical contributions to the field and a more detailed discussion of the Bayesian approaches developed and applied in the context of Big Data.
On the basis of the reviews made above, it is clear that there has been a huge amount of work on issues related to cloud computing, analytics infrastructure and so on. However, the amount of research conducted from statistical perspectives is also notable. In the last five years, there has been an exponential increase in published studies focused on developing new statistical methods and algorithms, as well as scaling existing methods. These have been summarised in Section 4, with particular focus on Bayesian approaches in Section 5. In some instances citations are made outside of the specific period (see section 2) to refer the origin of the methods which are currently being applied or extended in Big Data scenarios.
With the advent of computational infrastructure and advances in programming and software, Bayesian approaches are no longer considered as being very computationally expensive and onerous to execute for large volumes of data, that is Big Data. Traditional Bayesian methods are now becoming much more scalable due to the advent of parallelisation of MCMC algorithms, divide and conquer and/or subsampling methods in MCMC, and advances in approximations such as HMC, SMC, ABC, VB and so on. With the increasing volume of data, nonparametric Bayesian methods are also gaining in popularity.
This review chapter aimed to review a range of methodological and computational advancement made in Bayesian Statistics for handling the difficulties arose by the advent of Big Data. By not focusing to any particular application, this chapter provided the readers with a general overview of the developments of Bayesian methodologies and computational algorithms for handling these issues. The review has revealed that most of the advancements in Bayesian Statistics for Big Data have been around computational time and scalability of particular algorithms, concentrating on estimating the posterior by adopting different techniques. However the developments of Bayesian methods and models for Big Data in the recent literature cannot be overlooked. There are still many open problems for further research in the context of Big Data and Bayesian approaches, as highlighted in this chapter.
Based on the above discussion and the accompanying review presented in this chapter, it is apparent that to address the challenges of Big Data along with the strength of Bayesian statistics, research on both algorithms and models are essential.
References
 [1] (2014) Distributed stochastic gradient MCMC. In International Conference on Machine Learning, pp. 1044–1052. Cited by: §5.1.
 [2] (2016) Big data analytics in ecommerce: a systematic review and agenda for future research. Electronic Markets 26 (2), pp. 173–194. Cited by: 3rd item, Table 1.
 [3] (2015) Highperformance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3, pp. 1011–1025. Cited by: 2nd item, Table 1.
 [4] (2015) Efficient machine learning for big data: a review. Big Data Res. 2 (3), pp. 87–93. Cited by: 1st item, Table 1.
 [5] (2017) Data cultures of mobile dating and hookup apps: emerging issues for critical social science research. Big Data Soc. 4 (2), pp. 1–11. Cited by: 1st item, Table 1.
 [6] (2014) A generalized leastsquare matrix decomposition. J Am Stat Assoc 109 (505), pp. 145–159. Cited by: Table 3.
 [7] (2014) Perspectives on Bayesian methods and big data. Cust. Needs and Solut. 1 (3), pp. 169–175. Cited by: Table 1, §5.1.
 [8] (2017) A systematic review of techniques and sources of big data in the healthcare sector. J Med Syst 41 (11), pp. 183. Cited by: Table 1.
 [9] (2015) From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genom. 8 (1), pp. 33. Cited by: 2nd item, Table 1.
 [10] (2017) Frequent itemsets mining for big data: a comparative analysis. Big Data Res. 9, pp. 67–83. Cited by: 2nd item, Table 1.
 [11] (2015) Big data computing and clouds: trends and future directions. Journal of Parallel and Distributed Comput. 79, pp. 3–15. Cited by: 1st item, Table 1.
 [12] (2019) Structured Bayesian Gaussian process latent variable model: applications to datadriven dimensionality reduction and highdimensional inversion. J Comput Phys 383, pp. 166–195. Cited by: §5.2.
 [13] (2015) Dimensionality reduction of medical big data using neuralfuzzy classifier. Soft Comput. 19 (4), pp. 1115–1127. Cited by: 3rd item.
 [14] (2014) A scalable machine learning online service for big data realtime analysis. In Computational Intelligence in Big Data (CIBD), 2014 IEEE Symposium on, pp. 1–8. Cited by: 2nd item.
 [15] (2017) Highdimensional Bayesian geostatistics. Bayesian Anal. 12 (2), pp. 583. Cited by: §5.2.
 [16] (2016) Big data for infectious disease surveillance and modeling. J. Infect. Dis. 214 (suppl_4), pp. S375–S379. Cited by: 2nd item, Table 1.
 [17] (2014) Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In International Conference on Machine Learning (ICML), pp. 405–413. Cited by: §5.1.
 [18] (2017) On Markov chain Monte Carlo methods for tall data. J Mach Learn Res 18 (1), pp. 1515–1557. Cited by: §5.1.
 [19] (2014) Big data in health care: using analytics to identify and manage highrisk and highcost patients. Health Aff. 33 (7), pp. 1123–1131. Cited by: 2nd item, Table 1.
 [20] (2002) The infinite hidden markov model. In Advances in Neural Information Processing Systems, pp. 577–584. Cited by: §5.2.
 [21] (2015) Big data analytics in healthcare. BioMed Res. Int. 2015. Cited by: Table 1.
 [22] (2016) Social big data: recent achievements and new challenges. Inf. Fus. 28, pp. 45–59. Cited by: §1, Table 1.
 [23] (2008) Bayesian Networks. Encycl. Stat. Qual. Reliab. 1, pp. 1–6. Cited by: §5.2.
 [24] (2015) Sequential Monte Carlo methods for Bayesian elliptic inverse problems. Stat. Comput. 25 (4), pp. 727–737. Cited by: §5.1.
 [25] (2017) A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv: 1701.02434. Cited by: §5.1.
 [26] (2016) Big data and machine learning in radiation oncology: state of the art and future prospects. Cancer Lett. 382 (1), pp. 110–117. Cited by: 3rd item, Table 1.
 [27] (2014) Big data stream learning with Samoa. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1199–1202. Cited by: Table 1.
 [28] (2015) Big data in medical science—a biostatistical view: part 21 of a series on evaluation of scientific publications. Dtsch. Ärztebl. Int. 112 (9), pp. 137. Cited by: 2nd item, Table 1.
 [29] (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112 (518), pp. 859–877. Cited by: §5.1, §5.2.
 [30] (2018) The bouncy particle sampler: a nonreversible rejectionfree Markov chain Monte Carlo method. J Am Stat Assoc, pp. 1–13. Cited by: §5.1.
 [31] (2017) The role of big data and predictive analytics in retail.. Journal of Retailing 93 (1), pp. 79–95. Cited by: 3rd item, Table 1.
 [32] (2014) Cloud computing and big data: a review of current service models and hardware perspectives. J Softw. Eng. Appl. 7 (08), pp. 686. Cited by: 2nd item, Table 1.
 [33] (2017) Classification and Regression Trees. Routledge. Cited by: §5.1.
 [34] (2015) Nursing needs big data and big data needs nursing. J Nurs. Scholarsh. 47 (5), pp. 477–484. Cited by: Table 1.
 [35] (2015) Computational analysis of celltocell heterogeneity in singlecell rnasequencing data reveals hidden subpopulations of cells. Nat. Biotechnol. 33 (2), pp. 155. Cited by: §5.2.
 [36] (2016) Big data, big bang?. Journal of Big Data 3 (1), pp. 2. Cited by: Table 1.
 [37] (2014) After the crisis? Big data and the methodological challenges of empirical sociology. Big Data Soc. 1 (1), pp. 1–6. Cited by: 1st item, Table 1.
 [38] (2017) IoTbased big data storage systems in cloud computing: perspectives and challenges. IEEE Internet Things J 4 (1), pp. 75–87. Cited by: Table 1.
 [39] (2017) Vectors into the future of mass and interpersonal communication research: big data, social media, and computational social science. Hum. Commun. Res. 43 (4), pp. 545–558. Cited by: 1st item, Table 1.
 [40] (2016) Compressing an ensemble with statistical models: an algorithm for global 3d spatiotemporal temperature. Technometrics 58 (3), pp. 319–328. Cited by: §4.
 [41] (2013) A framework for evaluating approximation methods for gaussian process regression. J Mach. Learn. Res. 14 (Feb), pp. 333–350. Cited by: §5.2.
 [42] (2013) Parallel sampling of DP mixture models using subcluster splits. In Advances in Neural Information Processing Systems, pp. 620–628. Cited by: §5.1.
 [43] (2011) Empirical likelihood for small area estimation. Biometrika, pp. 473–480. Cited by: §5.2.
 [44] (2015) Statistics in big data. J Chin. Stat. Assoc. 53, pp. 186–202. Cited by: Table 1, Table 3, §4.
 [45] (2014) Stochastic gradient Hamiltonian Monte Carlo. In Int. Conference on Machine Learning, pp. 1683–1691. Cited by: Table 1, §5.1.
 [46] (2018) Moving beyond consent for citizen science in big data health and medical research. Northwest. J Technol. Intellect. Prop. 16 (1), pp. 15. Cited by: 2nd item, Table 1.
 [47] (2010) BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 (1), pp. 266–298. Cited by: §5.1.
 [48] (2013) SMC2: an efficient algorithm for sequential analysis of state space models. JRoyal Stat. Soc. Ser. B (Stat. Methodol.) 75 (3), pp. 397–426. Cited by: §5.1.
 [49] (2013) Deep Gaussian processes. In Artificial Intelligence and Statistics, pp. 207–215. Cited by: §5.2.
 [50] (2013) Big data analytics: a framework for unstructured data analysis. Int. J Eng. Sci. Technol. 5 (1), pp. 153. Cited by: Table 1.
 [51] (2015) What is big data? a consensual definition and a review of key research topics. In AIP Conference Proceedings, Vol. 1644 (1), pp. 97–104. Cited by: §1, Table 1.
 [52] (2016) A formal definition of big data based on its essential features. Libr. Rev. 65 (3), pp. 122–135. Cited by: §1, 1st item, 2nd item, Table 1.
 [53] (2015) Distributed Gaussian processes. In Proceedings of the 32nd International Conference on International Conference on Machine LearningVolume 37, pp. 1481–1490. Cited by: §5.2.
 [54] (2013) Leveraging the capabilities of serviceoriented decision support systems: putting analytics and big data in cloud. Decis. Support Syst. 55 (1), pp. 412–421. Cited by: Table 1.
 [55] (2018) Machine learning algorithms in big data analytics. Int. J Comput. Sci. Eng. 6 (1), pp. 63–70. Cited by: Table 1.
 [56] (2017) Shortened Bridge Sampler: Using deterministic approximations to accelerate SMC for posterior sampling. arXiv preprint arXiv 1707.07971. Cited by: §5.1.
 [57] (2015) Statistical model selection with “big data”. Cogent Econ. Fin. 3 (1), pp. 1045216. Cited by: Table 1, Table 3.
 [58] (2009) Autometrics. In in Honour of David F. Hendry, pp. 88–121. Cited by: 1st item.
 [59] (2018) Approximating the likelihood in ABC. In Handbook of Approximate Bayesian Computation, pp. 321–368. Cited by: §5.1.
 [60] (2018) A glimpse on big data analytics in the framework of marketing strategies. Soft Comput. 22 (1), pp. 325–342. Cited by: 3rd item, Table 1.
 [61] (2018) Statistics in the big data era: failures of the machine. Stat. Probab. Lett. 136, pp. 4–9. Cited by: Table 1, Table 3, §4.
 [62] (2017) ABCpy: a userfriendly, extensible, and parallel library for approximate bayesian computation. In Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 1–9. Cited by: §5.2.
 [63] (2015) Understandable big data: a survey. Comput. Sci. Rev. 17, pp. 70–81. Cited by: 2nd item, Table 1.
 [64] (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2 (3), pp. 267–279. Cited by: Table 1.
 [65] (2014) Challenges of big data analysis. Natl. Sci. Rev. 1 (2), pp. 293–314. Cited by: 2nd item, Table 1.
 [66] (2017) Big data integration with business processes: a literature review. Bus. Process Manag. J 23 (3), pp. 477–492. Cited by: Table 1.
 [67] (2016) Statistical inference, learning and models in big data. Int. Stat. Rev. 84 (3), pp. 371–389. Cited by: Table 1, Table 3, §4, §4.
 [68] (2018) Asymptotic properties of approximate Bayesian computation. Biometrika 00 (0), pp. 1–15. Cited by: §5.1.
 [69] (2014) Distributed variational inference in sparse Gaussian process regression and latent variable models. In Advances in neural information processing systems, pp. 3257–3265. Cited by: §5.1.
 [70] (2015) Beyond the hype: big data concepts, methods, and analytics. Int. J Inf. Manag. 35 (2), pp. 137–144. Cited by: §1, Table 1.
 [71] (2015) Distributed inference for dirichlet process mixture models. In International Conference on Machine Learning, pp. 2276–2284. Cited by: §5.1, §5.2.
 [72] (2017) Random forests for big data. Big Data Res. 9, pp. 28–46. Cited by: Table 3.
 [73] (2006) Infinite latent feature models and the indian buffet process. In Advances in neural information processing systems, pp. 475–482. Cited by: §5.2.
 [74] (2013) Bayesian nonparametrics and the probabilistic approach to modelling. Phil. Trans. R. Soc. A 371 (1984), pp. 20110553. Cited by: §5.2.
 [75] (2018) Online sequential monte carlo smoother for partially observed diffusion processes. URASIP J Adv Signal Process 2018 (1), pp. 9. Cited by: §5.1.
 [76] (2012) Large complex data: divide and recombine (d&r) with rhipe. Stat 1 (1), pp. 53–67. Cited by: §5.1.
 [77] (201806) MetaKriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets. Technometrics 60 (4), pp. 430–444. Cited by: §5.1.
 [78] (2019) Multivariate spatial meta kriging. Stat. Probab. Lett. 144, pp. 3–8. Cited by: §5.1.
 [79] (2014) Bayesian conditional density filtering for big data. Stat 1050, pp. 15. Cited by: §5.1.
 [80] (2018) Subsampling Sequential Monte Carlo for Static Bayesian Models. arXiv Preprint arXiv:1805.03317. Cited by: Table 1, §5.1.
 [81] (2015) Forecasting with big data: a review. Ann. of Data Sci. 2 (1), pp. 5–19. Cited by: Table 1.
 [82] (2013) Big data opportunities for global infectious disease surveillance. PLoS Med. 10 (4), pp. e1001413. Cited by: 2nd item, Table 1.
 [83] (2017) Methods for analyzing large spatial data: a review and comparison. arXiv preprint arXiv:1710.05013. Cited by: §5.2.
 [84] (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835. Cited by: Table 1, §4, §5.2.
 [85] (2015) Scalable variational Gaussian process classification. In Artificial Intelligence and Statistics (AISTATS), 18th International Conference on, pp. 351–360. Cited by: §5.2.
 [86] (2016) Big data for development: a review of promises and challenges. Dev. Policy Rev. 34 (1), pp. 135–174. Cited by: Table 1, §4.
 [87] (2014) Applying statistical thinking to ‘Big Data’ problems. Wiley Interdisci. Rev.: Comput. Stat. 6 (4), pp. 222–232. Cited by: Table 1, Table 3, §4.
 [88] (2013) Stochastic variational inference. J Mach. Learn. Res. 14 (1), pp. 1303–1347. Cited by: §5.1, §5.2.
 [89] (2014) Big data machine learning and graph analytics: current state and future challenges. In 2014 IEEE International Conference on Big Data (Big Data), pp. 16–17. Cited by: Table 1.
 [90] (2019) ABC–CDE: Toward Approximate Bayesian Computation With Complex HighDimensional Data and Limited Simulations. J Comput Graph Stat, pp. 1–20. Cited by: §5.1.

[91]
(2014)
Data, dikw, big data and data science
. Procedia Comput. Sci. 31, pp. 814–821. Cited by: §1.  [92] (2013) Big data: issues and challenges moving forward. In 2013 46th Hawaii International Conference on System Sciences, pp. 995–1004. Cited by: §1.
 [93] (2013) bartMachine: Machine learning with Bayesian additive regression trees. arXiv preprint arXiv:1312.2171. Cited by: §5.1.
 [94] (2013) A novel parallel implementation of naive bayesian classifier for big data. In Green Computing, Communication and Conservation of Energy (ICGCE), 2013 International Conference on, pp. 847–852. Cited by: §5.2.
 [95] (2014) Austerity in MCMC land: cutting the metropolishastings budget. In International Conference on Machine Learning, pp. 181–189. Cited by: §5.1.
 [96] (2018) MultiAgent based MapReduce Model for Efficient Utilization of System Resources. Indones. JElectr. Eng. Comput. sci. 11 (2), pp. 504–514. Cited by: Table 1.
 [97] (2015) A survey of open source tools for machine learning with big data in the hadoop ecosystem. J Big Data 2 (1), pp. 24. Cited by: Table 1.
 [98] (2013) Discussing the “big n problem”. Stat Methods Appt 22 (1), pp. 97–112. Cited by: §5.1.
 [99] (2003) Bayesian Empirical Likelihood. Biom. 90 (2), pp. 319–326. Cited by: §5.2.
 [100] (2016) Forest resampling for distributed sequential Monte Carlo. Stat. Anal. Data Min. 9 (4), pp. 230–248. Cited by: Table 1, §5.1.
 [101] (2010) On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. J Comput. Graph. Stat. 19 (4), pp. 769–789. Cited by: §5.1.
 [102] (2018) ABC model selection for spatial extremes models applied to South Australian maximum temperature data. Comput. Stat. Data Anal. 128, pp. 128–144. Cited by: Table 1, §5.2.
 [103] (2016) Geospatial big data handling theory and methods: a review and research challenges. ISPRS J Photogramm Remote Sens 115, pp. 119–133. Cited by: §1.
 [104] (2013) Online learning of nonparametric mixture models via sequential variational approximation. In Advances in Neural Information Processing Systems, pp. 395–403. Cited by: §5.1, §5.2.
 [105] (2017) Divideandconquer with sequential Monte Carlo. J Comput. Graph. Stat. 26 (2), pp. 445–458. Cited by: Table 1, §5.1.
 [106] (2018) Bayesian regression trees for highdimensional prediction and variable selection. J Am. Stat. Assoc., pp. 1–11. Cited by: §5.1.
 [107] (2017) Bayesian variable selection regression of multivariate responses for group data. Bayesian Anal. 12 (4), pp. 1039–1067. Cited by: §5.2.
 [108] (2013) Scalable sentiment classification for big data analysis using Naive Bayes classifier. In 2013 IEEE International Conference on Big Data, pp. 99–104. Cited by: Table 1, §5.2.
 [109] (2013) Computing infrastructure for big data processing. Front. Comput. Sci. 7 (2), pp. 165—170. Cited by: Table 1.
 [110] (2016) Stein variational gradient descent: a general purpose Bayesian inference algorithm. In Advances In Neural Information Processing Systems, pp. 2378–2386. Cited by: Table 1, §5.1.
 [111] (2018) ABC Variable Selection with Bayesian Forests. arXiv preprint arXiv:1806.02304. Cited by: §5.2.
 [112] (2017) Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data. BioData Min. 10 (1). Cited by: Table 1, Table 3.
 [113] (2015) Reflections on societal and business model transformation arising from digitization and big data analytics: a research agenda. J Strategic Inf. Syst. 24 (3), pp. 149–157. Cited by: Table 1.
 [114] (2016) Big data application in biomedical research and health care: a literature review. Biomed Inform Insights 8, pp. BII–S31559. Cited by: §1, 2nd item.
 [115] (2014) Bayesian estimation of dirichlet mixture model with variational inference. Pattern recognit. 47 (9), pp. 3143–3157. Cited by: Table 1, §5.2.
 [116] (2014) Firefly Monte Carlo: Exact MCMC with Subsets of Data. In Artificial Intelligence, TwentyFourth International Joint Conference on, pp. 543–552. Cited by: §5.1.
 [117] (2017) TPCxhs v2: transforming with technology changes. In Technology Conference on Performance Evaluation and Benchmarking, pp. 120–130. Cited by: Table 1.
 [118] (2017) Big data for public health policymaking: policy empowerment. Public Health genom. 20 (6), pp. 312–320. Cited by: Table 1.
 [119] (2017) Informed subsampling MCMC: approximate bayesian inference for large datasets. Stat. Comput., pp. 1–34. Cited by: §5.1.
 [120] (2018) Survey of challenges in encrypted data storage in cloud computing and big data. JNetw. Commun. Emerg. Technol. 8 (2). Cited by: Table 1.
 [121] (2016) Understanding how big data leads to social networking vulnerability. Comput. in Hum. Behav. 57, pp. 348–351. Cited by: Table 1.
 [122] (2015) How leading organizations use big data and analytics to innovate. Strategy Leadersh. 43 (5), pp. 32–39. Cited by: 3rd item, Table 1.
 [123] (2014) Big data, big results: knowledge discovery in output from largescale analytics. Stat. Analysis Data Min. 7 (5), pp. 404–412. Cited by: §4.
 [124] (2007) Variational approximations in bayesian model selection for finite mixture distributions. Comput. Stat.Data Anal. 51 (11), pp. 5352–5367. Cited by: §5.2.
 [125] (2018) Approximate Bayesian computation and simulationbased inference for complex stochastic epidemic models. Stat. Sci. 33 (1), pp. 4–18. Cited by: §5.1.
 [126] (2014) GPSABC: Gaussian process surrogate approximate Bayesian computation. arXiv preprint arXiv:1401.2838. Cited by: §5.1.
 [127] (2013) Bayesian computation via empirical likelihood. Proc. National Acad. Sciences 110 (4), pp. 1321–1326. Cited by: §5.1.
 [128] (2017) Robust and scalable bayes via a median of subset posterior measures. J Mach. Learn. Res. 18 (1), pp. 4488–4527. Cited by: Table 1, §5.1.
 [129] (2015) Preprocessing for approximate bayesian computation in image analysis. Stat. Comput. 25 (1), pp. 23–33. Cited by: Table 1, §5.2.
 [130] (2017) Collaborative anomaly detection framework for handling big data of cloud computing. In Military Communications and Information Systems Conference (MilCIS), 2017, pp. 1–6. Cited by: 2nd item, Table 1.
 [131] (2016) Utilizing big data analytics for information systems research: challenges, promises and guidelines. Eur J Inf Syst 25 (4), pp. 289–302. Cited by: Table 1.
 [132] (2015) Bayesian nonparametric data analysis. Springer. Cited by: §5.2.
 [133] (2017) Variational Sequential Monte Carlo. arXiv preprint arXiv:1705.11140. Cited by: §5.1.
 [134] (2013) Asymptotically exact, embarrassingly parallel MCMC. arXiv preprint arXiv:1311.4780. Cited by: §5.1.
 [135] (2019) Scalable bayesian nonparametric clustering and classification. Journal of Computational and Graphical Statistics, pp. 1–13. Cited by: §5.2.
 [136] (2017) Mathematical and statistical foundations and challenges of (big) data sciences. S. Afr. J Sci. 113 (34), pp. 1–4. Cited by: Table 1, Table 3.
 [137] (2013) ‘Big data’, hadoop and cloud computing in genomics. J Biomed. Inform. 46 (5), pp. 774–781. Cited by: Table 1.
 [138] (2014) Integrating r and hadoop for big data analysis. Romanian Stat. Rev. 62 (2), pp. 83–94. Cited by: Table 1.
 [139] (2016) Predicting the future—big data, machine learning, and clinical medicine. N. Engl. JMedicine 375 (13), pp. 1216. Cited by: 3rd item, Table 1.
 [140] (2016) Big questions on big data. Revista de Cercetare si Interv. Soc. 55, pp. 112. Cited by: 2nd item, Table 1.
 [141] (2001) Empirical Likelihood. Chapman and Hall/CRC. Cited by: §5.1.
 [142] (2014) Prominence of mapreduce in big data processing. In Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, pp. 555–560. Cited by: Table 1.
 [143] (2015) A novel feature selection scheme for highdimensional data sets: fourstaged feature selection. J Appl. Stat. 43 (6), pp. 1140–1154. Cited by: Table 1, Table 3.
 [144] (1999) Subsampling. Springer Science & Business Media. Cited by: §4.
 [145] (2015) Bayesian semiparametric hierarchical empirical likelihood spatial models. J Stat. Plan. Inference 165, pp. 78–90. Cited by: §5.2.
 [146] (2015) Multivariate spatial hierarchical Bayesian empirical likelihood methods for small area estimation. Stat. 4 (1), pp. 108–116. Cited by: §5.2.
 [147] (2015) Reliable ABC model choice via random forests. Bioinformatics 32 (6), pp. 859–866. Cited by: §5.1.
 [148] (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016 (1), pp. 67. Cited by: 1st item.
 [149] (201803) Speeding up MCMC by efficient data subsampling. J Am. Stat. Assoc., pp. 1–13. Cited by: Table 1, §5.1.
 [150] (2015) Scalable MCMC for large data problems using data subsampling and the difference estimator. SSRN Electronic Journal. Cited by: §5.1.
 [151] (2015) Variational consensus Monte Carlo. In Advances in Neural Information Processing Systems, pp. 1207–1215. Cited by: §5.1.
 [152] (2014) Big data analytics in healthcare: promise and potential. Health Inf. Sci. and Syst. 2 (1), pp. 3. Cited by: 2nd item, Table 1.
 [153] (2018) Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int. Journal of Inf. Manag. 38 (1), pp. 187–195. Cited by: 3rd item, Table 1.

[154]
(2000)
The infinite gaussian mixture model
. In Advances in Neural Information Processing Systems, pp. 554–560. Cited by: §5.2.  [155] (2004) Gaussian processes in machine learning. In Advanced lectures on machine learning, pp. 63–71. Cited by: §5.2.
 [156] (2017) Posterior concentration for Bayesian regression trees and forests. Ann Stat(In Revision), pp. 1–40. Cited by: §5.1.
 [157] (2014) Creating value in health care through big data: opportunities and policy implications. Health Affairs 33 (7), pp. 1115–1122. Cited by: 2nd item, Table 1.
 [158] (2016) Big data analytics to improve cardiovascular care: promise and challenges. Nat. Rev. Cardiol. 13 (6), pp. 350–359. Cited by: 2nd item, Table 1.
 [159] (2013) Big data: a review. In Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp. 42–47. Cited by: 1st item, 2nd item, Table 1.
 [160] (2005) Bayesian exponentially tilted empirical likelihood. Biometrika 92 (1), pp. 31–46. Cited by: §5.2.
 [161] (2016) Online updating of statistical inference in the big data setting. Technometrics 58 (3), pp. 393–403. Cited by: Table 1, 2nd item, Table 3, §4.
 [162] (2016) Bayes and big data: the consensus Monte Carlo algorithm. Int. J Manag. Sci. Eng. Manag. 11 (2), pp. 78–88. Cited by: Table 1, §5.1.
 [163] (2015) Big data, digital media, and computational social science: possibilities and perils. Ann Am Acad Pol Soc Sci 659 (1), pp. 6–13. Cited by: 1st item, Table 1.
 [164] (2017) Big data storage technologies: a survey. Front. of Inf. Technol. & Electronic Eng. 18 (8), pp. 1040–1070. Cited by: 1st item, Table 1.
 [165] (2018) Multistatistic Approximate Bayesian Computation with multiarmed bandits. arXiv preprint arXiv:1805.08647. Cited by: §5.1.
 [166] (2018) Overview of abc. Handbook of Approximate Bayesian Computation, pp. 3–54. Cited by: §5.1.
 [167] (2017) Critical analysis of big data challenges and analytical methods. J Bus. Res. 70, pp. 263–286. Cited by: Table 1.
 [168] (2018) Scalable Bayes via barycenter in Wasserstein space. J Mach. Learn. Res. 19 (1), pp. 312–346. Cited by: §4, §5.1.
 [169] (2015) Unbiased Bayes for big data: paths of partial posteriors. arXiv preprint arXiv:1501.03326. Cited by: Table 1, §5.1.
 [170] (2010) Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J Comput. Graphical Stat. 19 (2), pp. 419–438. Cited by: §5.1.
 [171] (2018) Big data analytics services for enhancing business intelligence. J Comput. Inf. Syst. 58 (2), pp. 162–169. Cited by: Table 1.
 [172] (2014) Big data classification: problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Perform. Eval. Rev. 41 (4), pp. 70–73. Cited by: Table 1.
 [173] (2014) Bootstrap confidence intervals for largescale multivariate monotonic regression problems. Commun. Stat.  Simul. Comput. 45 (3), pp. 1025–1040. Cited by: Table 1, Table 3.
 [174] (2013) Clouds for scalable big data analytics. Comput. 46 (5), pp. 98–101. Cited by: 2nd item, Table 1.
 [175] (2016) Bayesian network structure learning from big data: a reservoir sampling based ensemble method. In International Conference on Database Systems for Advanced Applications, pp. 209–222. Cited by: §5.2.
 [176] (2015) Streaming variational inference for Bayesian nonparametric mixture models. In Artificial Intelligence and Statistics, pp. 968–976. Cited by: §5.2.
 [177] (2016) Consistency and fluctuations for stochastic gradient langevin dynamics. J Mach. Learn. Res. 17 (1), pp. 193–225. Cited by: §5.1.
 [178] (2015) The variational Gaussian process. arXiv preprint arXiv:1511.06499. Cited by: §5.2.
 [179] (2015) Particle gibbs for infinite hidden markov models. In Advances in Neural Information Processing Systems, pp. 2395–2403. Cited by: Table 1, §5.2.
 [180] (2017) Bayesian dyadic trees and histograms for regression. In Advances in Neural Information Processing Systems, pp. 2089–2099. Cited by: §5.1.
 [181] (2015) Big data, big knowledge: big data for personalized healthcare. IEEE J Biomed Health Inform 19 (4), pp. 1209–1215. Cited by: 2nd item, Table 1.
 [182] (2017) Comparative Study of MapReduce Frameworks in Big Data Analytics. Int. J Mod. Comput. Sci. 5 (Special Issue), pp. 5–13. Cited by: Table 1.
 [183] (2015) How ‘big data’ can make big impact: findings from a systematic review and a longitudinal case study. Int. J Prod. Econ. 165, pp. 234–246. Cited by: §1, 3rd item, Table 1.
 [184] (2016) Statistical methods and computing for big data. Stat Interface 9 (4), pp. 399–414. Cited by: Table 1, Table 3, §4.
 [185] (2011) Online variational inference for the hierarchical dirichlet process. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 752–760. Cited by: §5.2.
 [186] (2017) Online updating method with new variables for big data streams. Can. Journal Stat. 46 (1), pp. 123–146. Cited by: Table 1, Table 3.
 [187] (2014) A scalable data science workflow approach for big data Bayesian network learning. In 2014 IEEE/ACM Int Symp. Big Data Comput., pp. 16–25. Cited by: §5.2.
 [188] (2011) Fast Bayesian inference in Dirichlet process mixture models. J Comput. Graphical Stat. 20 (1), pp. 196–216. Cited by: §5.2.
 [189] (2017) High dimensional change point estimation via sparse projection. J Royal Stat. Soc.: Ser. B (Stat. Methodol.) 80 (1), pp. 57–83. Cited by: Table 1, Table 3.
 [190] (2013) Parallelizing MCMC via weierstrass sampler. arXiv preprint arXiv:1312.4605. Cited by: §5.1.
 [191] (2015) Fast clustering using adaptive density peak detection. Stat. Methods Med. Res. 26 (6), pp. 2800–2811. Cited by: Table 3.
 [192] (2014) Tutorial: big data analytics: concepts, technologies, and applications.. CAIS 34, pp. 65. Cited by: Table 1.
 [193] (2017) Big data and neuroimaging. Stat. Biosci. 9 (2), pp. 543–558. Cited by: Table 1.
 [194] (2015) Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution. Stat. and Comput. 25 (2), pp. 289–301. Cited by: §5.1.
 [195] (2014) Accelerating ABC methods using Gaussian processes. In Artificial Intelligence and Statistics, pp. 1015–1023. Cited by: §5.1.
 [196] (2013) Parallel Markov chain Monte Carlo for nonparametric mixture models. In Proceedings of the 30th International Conference on Machine Learning (ICML13), pp. 98–106. Cited by: §5.1.
 [197] (2015) Why theory matters more than ever in the age of big data. J Learn. Anal. 2 (2), pp. 5–13. Cited by: Table 1, §4.
 [198] (2017) Average of recentered parallel MCMC for big data. arXiv preprint arXiv:1706.04780. Cited by: Table 1, §5.1, §5.1.
 [199] (2017) Small data, mid data, and big data versus algebra, analysis, and topology. IEEE Signal Process. Mag. 34 (1), pp. 48–51. Cited by: Table 1.
 [200] (2017) Big data and cloud computing: innovation opportunities and challenges. Int. Journal of Digit. Earth 10 (1), pp. 13–53. Cited by: 2nd item, Table 1.
 [201] (2014) Big data analysis using modern statistical and machine learning methods in medicine. Int. Neurourol. J. 18 (2), pp. 50. Cited by: Table 1.
 [202] (2017) ADMM for penalized quantile regression in big data. Int. Stat. Rev. 85 (3), pp. 494–518. Cited by: Table 3.
 [203] (2019) Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments. Stat Anal Data Min 12 (3), pp. 197–209. Cited by: §5.2.
 [204] (2017) An exact approach to ridge regression for big data. Comput.Stat., pp. 1–20. Cited by: Table 1, Table 3.
 [205] (2014) A hybrid approach for scalable subtree anonymization over big data using MapReduce on cloud. J Comput. Syst. Sci. 80 (5), pp. 1008–1020. Cited by: 2nd item, Table 1.
 [206] (2016) Parallel processing systems for big data: a survey. Proceedings of the IEEE 104 (11), pp. 2114–2136. Cited by: 2nd item, Table 1.
 [207] (2018) The convergence of new computing paradigms and big data analytics methodologies for online social networks. J Comput. Sci. 26, pp. 453–455. Cited by: Table 1.
 [208] (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237, pp. 350–361. Cited by: §5.2.
 [209] (2017) Big learning with Bayesian methods. Natl Sci. Rev. 4 (4), pp. 627–651. Cited by: Table 1, §5.2.
 [210] (2013) Scaling the Indian Buffet process via submodular maximization. In International Conference on Machine Learning, pp. 1013–1021. Cited by: Table 1, §5.2.
Acknowledgement
This research was supported by an ARC Australian Laureate Fellowship for project, Bayesian Learning for Decision Making in the Big Data Era under Grant no. FL150100150. The authors also acknowledge the support of the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).
Appendix
List of acronyms used for Bayesian Computational Algorithms and Models are:
ABC  Approximate Bayesian Computation 
BEL  Bayesian Empirical Likelihood 
BN  Bayesian Network 
BNN  Bayesian Neural Network 
BART  Bayesian Additive Regression Trees 
CART  Classification and Regression Trees 
DP  Dirichlet Process 
GP  Gaussian Process 
HMC  Hamiltonian Monte Carlo 
HMM  Hidden Markov Models 
IBP  Indian Buffet Process 
iHMM  Infinite Hidden Markov Models 
MCMC  Markov Chain Monte Carlo 
MH  Metropolis Hasting 
NBC  Naive Bayes Classifier 
NNGP  Nearest Neighbour Gaussian Process 
SMC  Sequential Monte Carlo 
VB  Variational Bayes 
VI  Variational Inference 
Comments
There are no comments yet.