InferSpark: Statistical Inference at Scale

07/07/2017
by   Zhuoyue Zhao, et al.
0

The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the potential of automatically generating inference algorithms for the user defined models and answering various statistical queries about the model. It is a perfect time to unite these two great directions to produce a programmable big data analysis framework. We thus propose, InferSpark, a probabilistic programming framework on top of Apache Spark. Efficient statistical inference can be easily implemented on this framework and inference process can leverage the distributed main memory processing power of Spark. This framework makes statistical inference on big data possible and speed up the penetration of probabilistic programming into the data engineering domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/23/2021

A very short guide to IOI: A general framework for statistical inference summarised

Integrated organic inference (IOI) is discussed in a concise and informa...
research
04/13/2023

A review of distributed statistical inference

The rapid emergence of massive datasets in various fields poses a seriou...
research
09/09/2015

Statistical Inference, Learning and Models in Big Data

The need for new methods to deal with big data is a common theme in most...
research
09/25/2021

Statistical Inference for Data Integration

In the age of big data, data integration is a critical step especially i...
research
03/30/2021

Scalable Statistical Inference of Photometric Redshift via Data Subsampling

Handling big data has largely been a major bottleneck in traditional sta...
research
10/24/2020

Triclustering in Big Data Setting

In this paper, we describe versions of triclustering algorithms adapted ...
research
12/15/2015

BayesDB: A probabilistic programming system for querying the probable implications of data

Is it possible to make statistical inference broadly accessible to non-s...

Please sign up or login with your details

Forgot password? Click here to reset